]> xenbits.xensource.com Git - people/vhanquez/xen-unstable.git/log
people/vhanquez/xen-unstable.git
14 years agostubdom: Fix stubdom-dm using "grep" improperly
John Weekes [Tue, 11 Jan 2011 16:42:41 +0000 (16:42 +0000)]
stubdom: Fix stubdom-dm using "grep" improperly

stubdom-dm uses "grep" on "xm list" output to determine whether it is
already running. The existing behavior is to use "grep $domname-dm" but
this will result in a false-positive in the case of another domU running
whose name ends with the full new name; for instance, if "abctest-dm" is
running, a new "test-dm" will spin forever, waiting for it the end.

Any easy fix is to have it use "grep -w" instead of "grep", searching
for the whole word only.

It also might be worth considering a switch to "xl list" from "xm list",
here and in other places.

Signed-off-by: John Weekes <lists.xen@nuclearfallout.net>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agostubdom/minios: don't retrieve the address of void variable
Gianni Tedesco [Tue, 11 Jan 2011 16:31:47 +0000 (16:31 +0000)]
stubdom/minios: don't retrieve the address of void variable

Objects must not be declared to have type void.  Declare shared_info
to have the appropriate type instead.

Author: Ganni Tedesco <gianni.tedesco@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agostubdom/minios: use correct sized types for software floating point
Samuel Thibault [Tue, 11 Jan 2011 16:30:15 +0000 (16:30 +0000)]
stubdom/minios: use correct sized types for software floating point

Replace long/int/short sizes with proper exact-size types for 64bit
architectures.  As well as making the code correct, this eliminates a
compiler warning about an uninitialised variable.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Add gfx_passthru parameter
Daniel De Graaf [Tue, 11 Jan 2011 16:13:07 +0000 (16:13 +0000)]
libxl: Add gfx_passthru parameter

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoxentrace: build fix "array subscript has type 'char'"
Keir Fraser [Tue, 11 Jan 2011 15:10:21 +0000 (15:10 +0000)]
xentrace: build fix "array subscript has type 'char'"

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
14 years agotools: Fix flask does not build because linking fails with
Keir Fraser [Tue, 11 Jan 2011 15:09:23 +0000 (15:09 +0000)]
tools: Fix flask does not build because linking fails with
missing dlopen/dlsym etc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
14 years agoxenctx: misc adjustments
Keir Fraser [Tue, 11 Jan 2011 11:41:39 +0000 (11:41 +0000)]
xenctx: misc adjustments

- fix off-by-one errors during symbol insertion and lookup
- don't store the symbol type, as it wasn't needed at all so far and
  is only needed now at parsing time
- don't insert certain kinds of symbols

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86: restore x2apic pre-enabled check logic
Keir Fraser [Tue, 11 Jan 2011 11:40:50 +0000 (11:40 +0000)]
x86: restore x2apic pre-enabled check logic

c/s 22475 removed the early checking without replacement, neglecting
the fact that x2apic_enabled must be set early for APIC register
accesses done during second stage ACPI table parsing (rooted at
acpi_boot_init()) to work correctly. Without this, particularly
determination of the boot CPU won't work, resulting in an attempt to
bring up that CPU again as a secondary one (which fails).

Restore the functionality, now calling it from generic_apic_probe().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoxenpaging: update machine_to_phys_mapping[] during page deallocation
Keir Fraser [Tue, 11 Jan 2011 11:27:37 +0000 (11:27 +0000)]
xenpaging: update machine_to_phys_mapping[] during page deallocation

The machine_to_phys_mapping[] array needs updating during page
deallocation.  If that page is allocated again, a call to
get_gpfn_from_mfn() will still return an old gfn from another guest.
This will cause trouble because this gfn number has no or different
meaning in the context of the current guest.

This happens when the entire guest ram is paged-out before
xen_vga_populate_vram() runs.  Then XENMEM_populate_physmap is called
with gfn 0xff000.  A new page is allocated with alloc_domheap_pages.
This new page does not have a gfn yet.  However, in
guest_physmap_add_entry() the passed mfn maps still to an old gfn
(perhaps from another old guest).  This old gfn is in paged-out state
in this guests context and has no mfn anymore.  As a result, the
ASSERT() triggers because p2m_is_ram() is true for p2m_ram_paging*
types.  If the machine_to_phys_mapping[] array is updated properly,
both loops in guest_physmap_add_entry() turn into no-ops for the new
page and the mfn/gfn mapping will be done at the end of the function.

If XENMEM_add_to_physmap is used with XENMAPSPACE_gmfn,
get_gpfn_from_mfn() will return an appearently valid gfn.  As a
result, guest_physmap_remove_page() is called.  The ASSERT in
p2m_remove_page triggers because the passed mfn does not match the old
mfn for the passed gfn.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: drop paged pages in guest_remove_page
Keir Fraser [Tue, 11 Jan 2011 10:38:28 +0000 (10:38 +0000)]
xenpaging: drop paged pages in guest_remove_page

Simply drop paged-pages in guest_remove_page(), and notify xenpaging
to drop its reference to the gfn. If the ring is full, the page will
remain in paged-out state in xenpaging. This is not an issue, it just
means this gfn will not be nominated again.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: update machine_to_phys_mapping[] during page-in
Keir Fraser [Tue, 11 Jan 2011 10:37:45 +0000 (10:37 +0000)]
xenpaging: update machine_to_phys_mapping[] during page-in

Update the machine_to_phys_mapping[] array during page-in. The gfn is
now at a different page and the array has still INVALID_M2P_ENTRY in
the index.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: make three functions static
Keir Fraser [Tue, 11 Jan 2011 10:33:20 +0000 (10:33 +0000)]
xenpaging: make three functions static

xenpaging_init(), xenpaging_teardown() and xenpaging_evict_page() are
only used in file scope, so they can be marked static.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: print page-in/page-out progress
Keir Fraser [Tue, 11 Jan 2011 10:32:59 +0000 (10:32 +0000)]
xenpaging: print page-in/page-out progress

Now that DPRINTF is triggered only when the environment variable
XENPAGING_DEBUG is found, make such a debug session actually useful by
printing the entire page-out/page-in process. The 'Got event from Xen'
message alone is not helpful.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: mkdir /var/lib/xen/xenpaging during make install
Keir Fraser [Tue, 11 Jan 2011 10:32:32 +0000 (10:32 +0000)]
xenpaging: mkdir /var/lib/xen/xenpaging during make install

pagefiles go to /var/lib/xen/xenpaging directory,
create this directory during make install

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: specify policy mru_size at runtime
Keir Fraser [Tue, 11 Jan 2011 10:32:05 +0000 (10:32 +0000)]
xenpaging: specify policy mru_size at runtime

The environment variable XENPAGING_POLICY_MRU_SIZE will change the
mru_size in the policy at runtime.  Specifying the mru_size at runtime
allows the admin to keep more pages in memory so guests can make more
progress.  Its also good for development to reduce the value to put
more pressure on the paging related code paths.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agoxenpaging: remove domain_id and mfn from struct xenpaging_victim
Keir Fraser [Tue, 11 Jan 2011 10:31:33 +0000 (10:31 +0000)]
xenpaging: remove domain_id and mfn from struct xenpaging_victim

Remove unused member 'mfn' from struct xenpaging_victim.

xenpaging operates on a single guest, so it needs only a single
domain_id.  Remove domain_id from struct xenpaging_victim and use the
one from paging->mem_event where needed. Its not used in the policy.

This saves 4MB runtime data with a 1GB pagefile.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agox86-64: refine access permission check for wrmsr to MSR_FAM10H_MMIO_CONF_BASE
Keir Fraser [Tue, 11 Jan 2011 10:30:46 +0000 (10:30 +0000)]
x86-64: refine access permission check for wrmsr to MSR_FAM10H_MMIO_CONF_BASE

We really don't want the mmconf window to move/disappear whenever we
use is ourselves, not only when we enabled it.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agovt-d: Remove unnecessary 'inline' qualifiers
Keir Fraser [Mon, 10 Jan 2011 10:37:53 +0000 (10:37 +0000)]
vt-d: Remove unnecessary 'inline' qualifiers

Compiler knows best when to inline. Also this shows up an unused flush
function which is removed by this patch.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agovtd: Remove unused 'align' local variable from iommu_flush_iotlb_psi().
Keir Fraser [Mon, 10 Jan 2011 10:32:04 +0000 (10:32 +0000)]
vtd: Remove unused 'align' local variable from iommu_flush_iotlb_psi().

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agovtd: Fix up iommu_flush_iotlb_psi().
Keir Fraser [Mon, 10 Jan 2011 10:31:09 +0000 (10:31 +0000)]
vtd: Fix up iommu_flush_iotlb_psi().

1. Change missed usage of 'align' to 'order'
2. Remove unused 'pages' parameter

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86_64: don't use weak symbols on x86-64
Keir Fraser [Mon, 10 Jan 2011 08:45:19 +0000 (08:45 +0000)]
x86_64: don't use weak symbols on x86-64

Various gcc versions inline functions that are both weak and hidden,
without even giving a warning.

Certainly the risk exists that we'll see the problem again when
another weak function gets introduced, but I don't see a way to
protect us from that.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Just remove the weak attribute altogether. It's the only one in
non-ia64-specific code. We can get teh same effect with ifdefs which
although a bit unsightly is better than using compiler/linker features
we cannot trust.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86-64: don't allow wrmsr to MSR_FAM10H_MMIO_CONF_BASE when Xen itself is using it
Keir Fraser [Mon, 10 Jan 2011 08:42:32 +0000 (08:42 +0000)]
x86-64: don't allow wrmsr to MSR_FAM10H_MMIO_CONF_BASE when Xen itself is using it

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoEPT/VT-d: bug fix for EPT/VT-d table sharing
Keir Fraser [Mon, 10 Jan 2011 08:40:32 +0000 (08:40 +0000)]
EPT/VT-d: bug fix for EPT/VT-d table sharing

This patch makes following changes: 1) Moves EPT/VT-d sharing
initialization back to when it is actually needed to make sure
vmx_ept_vpid_cap has been initialized.  2) added page order parameter
to iommu_pte_flush() to tell VT-d what size of page to flush.  3)
added hap_2mb flag to ease performance studies between base 4KB EPT
size and when 2MB and 1GB page size support are enabled.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
14 years agoUpdate hgignore list
Keir Fraser [Sat, 8 Jan 2011 11:07:18 +0000 (11:07 +0000)]
Update hgignore list

14 years agomem_access test tool: xen-access
Joe Epstein [Sat, 8 Jan 2011 11:06:18 +0000 (11:06 +0000)]
mem_access test tool: xen-access

Added a test tool to let the memory access APIs be tested.  This tool
logs to stdout the memory accesses that the domain given is performing.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agotools/tests: Move x86 emulator tests into a subdir
Keir Fraser [Sat, 8 Jan 2011 10:59:13 +0000 (10:59 +0000)]
tools/tests: Move x86 emulator tests into a subdir

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoUpdate AMD SVM feature flags
Keir Fraser [Sat, 8 Jan 2011 10:52:45 +0000 (10:52 +0000)]
Update AMD SVM feature flags

This patch updates AMD SVM feature flags (0x8000000A:EDX). It adds
several new feature bits, along with feature description. The feature
names are changed to be consistent with Linux kernel.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoUpdate AMD CPU feature flags 0x80000001:ECX for Xen Hypervisor
Keir Fraser [Sat, 8 Jan 2011 10:48:46 +0000 (10:48 +0000)]
Update AMD CPU feature flags 0x80000001:ECX for Xen Hypervisor

This patch syncs-up AMD CPU feature flags 0x80000001:ECX with the
latest Linux kernel. Several new features are added. Some of existing
features' names are changed as well.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agolibxc: Update AMD CPU feature flags 0x80000001:ECX for Xen tools
Keir Fraser [Sat, 8 Jan 2011 10:48:09 +0000 (10:48 +0000)]
libxc: Update AMD CPU feature flags 0x80000001:ECX for Xen tools

This patch syncs-up AMD CPU feature flags 0x80000001:ECX in libxc with
the latest Linux kernel.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agotimer: Don't hardcode cpu0 in migrate_timers_from_cpu().
Keir Fraser [Sat, 8 Jan 2011 10:43:01 +0000 (10:43 +0000)]
timer: Don't hardcode cpu0 in migrate_timers_from_cpu().

Although we don't allow cpu0 to be offlined, there's no need to
hardcode that assumption in the timer subsystem.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agotimer: Ensure that CPU field of a timer is read safely when lock-free.
Keir Fraser [Sat, 8 Jan 2011 10:09:44 +0000 (10:09 +0000)]
timer: Ensure that CPU field of a timer is read safely when lock-free.

Firstly, all updates must use atomic_write16(), and lock-free reads
must use atomic_read16(). Secondly, we ensure ->cpu is the only field
accessed without a lock. This requires us to place a special sentinel
value in that field when a timer is killed, to avoid needing to read
->status outside a locked critical section.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: Fix atomic_write*() macros to correctly inform GCC that memory
Keir Fraser [Sat, 8 Jan 2011 10:05:55 +0000 (10:05 +0000)]
x86: Fix atomic_write*() macros to correctly inform GCC that memory
it knows about is being written to.

The bug is a copy-and-paste error from inline asm that writes to I/O
memory. In that case, as with asm for accessign guest memory,
specifying memory as a read-only parameter is acceptable because the
memory cannot alias with anything that GCC reads directly.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agotimer: Fix up timer-state teardown on CPU offline / online-failure.
Keir Fraser [Sat, 8 Jan 2011 09:29:11 +0000 (09:29 +0000)]
timer: Fix up timer-state teardown on CPU offline / online-failure.

The lock-free access to timer->cpu in timer_lock() is problematic, as
the per-cpu data that is then dereferenced could disappear under our
feet. Now that per-cpu data is freed via RCU, we simply add a RCU
read-side critical section to timer_lock(). It is then also
unnecessary to migrate timers on CPU_DYING (we can defer it to the
nicer CPU_DEAD state) and we can also safely migrate timers on
CPU_UP_CANCELED.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: Free per-cpu area for offline cpu via RCU.
Keir Fraser [Sat, 8 Jan 2011 09:14:23 +0000 (09:14 +0000)]
x86: Free per-cpu area for offline cpu via RCU.

This allows other CPUs to reference per-cpu areas with less strict
locking. In particular, timer.c access a per-cpu lock with reference
to a per-timer cpu field which it accesses with no synchronisation.

One subtlety is that this prevents us bringing a cpu back online until
the RCU work is completed. In this case we return EBUSY and the
tool stack can report the (unlikely) error, or retry, as it sees fit.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoxl: Implement flexarray_append() and flexarray_vappend()
Gianni Tedesco [Fri, 7 Jan 2011 18:53:50 +0000 (18:53 +0000)]
xl: Implement flexarray_append() and flexarray_vappend()

Makes a lot of code simpler and nicer and saves a fair amount of screen
real-estate

Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoxl: Move device model functions in to a separate file
Gianni Tedesco [Fri, 7 Jan 2011 18:24:54 +0000 (18:24 +0000)]
xl: Move device model functions in to a separate file

No functional changes.

Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
14 years agominios: use constant expression to size arrays
Gianni Tedesco [Fri, 7 Jan 2011 18:01:18 +0000 (18:01 +0000)]
minios: use constant expression to size arrays

Fixes a compile error in gcc-4.5 which is the reason __CONST_RING_SIZE()
was introduced. Let's just use it in minios netfront.

Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agox86 asid: Do not check for per-cpu asid state being already initialised.
Keir Fraser [Fri, 7 Jan 2011 16:59:53 +0000 (16:59 +0000)]
x86 asid: Do not check for per-cpu asid state being already initialised.

It cannot be, since per-cpu data is re-allocated and zeroed across CPU
hotplug. Th ecomment about resetting teh per-cpu generation counter is
incorrect, since all vcpus must have been migrated to other cpus while
this cpu was offline, and that resets the per-vcpu generation stamp.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoASID: Optimize hvm_flush_guest_tlbs
Keir Fraser [Fri, 7 Jan 2011 14:13:15 +0000 (14:13 +0000)]
ASID: Optimize hvm_flush_guest_tlbs

In our testing, we found that function hvm_flush_guest_tlbs() is used
very frequently and it will always force asid recycling and will
result a whole tlb flush immediately no matter there are still free
asids or not.  Actually, in this case, just increasing core generation
might be enough and the remaining asids can still be used until
next_asid > max_asid.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Reviewed-by: Wei Huang <wei.huang2@amd.com>
Simplify the logic and also fix a very minor bug in
hvm_asid_handle_vmenter(), in the case that hvm_asid_flush_core() sets
data->disabled.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoUpdate my email address to long-term stable address.
Keir Fraser [Fri, 7 Jan 2011 13:30:04 +0000 (13:30 +0000)]
Update my email address to long-term stable address.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoFix 32-bit build after the latest mem-event series
Tim Deegan [Fri, 7 Jan 2011 11:55:35 +0000 (11:55 +0000)]
Fix 32-bit build after the latest mem-event series

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agomem_type HVMOP: replaced enum
Joe Epstein [Fri, 7 Jan 2011 11:54:52 +0000 (11:54 +0000)]
mem_type HVMOP: replaced enum

* Replaced the memory type enum with a 64-bit aligned value

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agomem_access: added trap injection to libxc
Joe Epstein [Fri, 7 Jan 2011 11:54:50 +0000 (11:54 +0000)]
mem_access: added trap injection to libxc

* Carries forward the trap injection hypercall into libxc

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
14 years agomem_access: added INT3/CRx capture
Joe Epstein [Fri, 7 Jan 2011 11:54:48 +0000 (11:54 +0000)]
mem_access: added INT3/CRx capture

* Allows a memory event listener to register for events on changes to
  CR0, CR3, and CR4, as well as INT3 instructions, as a part of the
  mem_access mechanism.  These events can be either synchronous or
  asynchronous.

* For INT3, the logic works independent of a debugger, and so both can
  be supported.

* The presence and type of listener are stored and accessed through
  HVM params.

* Changed the event mask handling to ensure that the right events are
  captured based on the listeners.

* Added the ability to inject HW/SW traps into a VCPU when it next
  resumes (rather than try to modify the existing IRQ injection
  code paths).  Only one trap to inject can be outstanding at a time.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agomem_access: HVMOPs for setting mem access
Joe Epstein [Fri, 7 Jan 2011 11:54:45 +0000 (11:54 +0000)]
mem_access: HVMOPs for setting mem access

* Creates HVMOPs for setting and getting memory access.  The hypercalls
  can set individual pages or the default access for new/refreshed
  pages.

* Added functions to libxc to access these hypercalls.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agomem_access: access listener can be required
Joe Epstein [Fri, 7 Jan 2011 11:54:42 +0000 (11:54 +0000)]
mem_access: access listener can be required

* Adds the ability to set that a domain that an access listener;
  that is, it pauses the VCPU if there is no memory event listener.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Keir Fraser <keir@xen.org>
14 years agomem_access: mem event additions for access
Joe Epstein [Fri, 7 Jan 2011 11:54:40 +0000 (11:54 +0000)]
mem_access: mem event additions for access

* Adds an ACCESS memory event type, with RESUME as the action.

* Refactors the bits in the memory event to store whether the memory event
  was a read, write, or execute (for access memory events only).  I used
  bits sparingly to keep the structure somewhat the same size.

* Modified VMX to report the needed information in its nested page fault.
  SVM is not implemented in this patch series.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agomem_access: introduce P2M mem_access types
Joe Epstein [Fri, 7 Jan 2011 11:54:36 +0000 (11:54 +0000)]
mem_access: introduce P2M mem_access types

* Introduces access types for each page, giving independent read, write, and
  execute permissions for each page.  The permissions are restrictive from
  what the page type gives: for example, a p2m_type_ro page with an access of
  p2m_access_rw would have read-only permissions in total, as p2m_type_ro
  removed write access and p2m_access_rw removed execute access.

* Implements the access flag storage for EPT, moving some bits from P2M type,
  which had 10 bits of storage, to the four bits for access.

* Access flags are stored according to a loose consistency contract, where
  pages can be reset to the default access permissions at any time.  Right
  now, that happens on page type changes, where one would want to reevaluate
  whether permissions make sense for that page as they are anyway.

Signed-off-by: Joe Epstein <jepstein98@gmail.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
14 years agokexec: correct _domain offset info in elf-notes
Keir Fraser [Thu, 6 Jan 2011 19:02:36 +0000 (19:02 +0000)]
kexec: correct _domain offset info in elf-notes

The hypervisor writes some data structure infos into the elf note
section of the vmcore to enable interpretation of the xen structures
by kexec/kdump.

The info of the offset of _domain in page_info was just wrong on
non-ia64 systems.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
14 years agolibxl: Lists qdisk device in libxl_device_disk_list
Anthony PERARD [Thu, 6 Jan 2011 18:04:48 +0000 (18:04 +0000)]
libxl: Lists qdisk device in libxl_device_disk_list

As libxl switch to qdisk when blktap isn't available, this patch makes
libxl_device_disk_list also list qdisk device. So
libxl_build_device_model_args_new will be able to add qdisk device to
the command line options of Qemu.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Factorize function libxl_device_disk_list
Anthony PERARD [Thu, 6 Jan 2011 18:03:11 +0000 (18:03 +0000)]
libxl: Factorize function libxl_device_disk_list

This patch adds function libxl_append_disk_list_of_type to get disks
parameter of one backend type.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoocaml: evtchn+xc bindings: use libxenctrl and libxenguest
Ian Campbell [Thu, 6 Jan 2011 17:37:00 +0000 (17:37 +0000)]
ocaml: evtchn+xc bindings: use libxenctrl and libxenguest

Now that tools/libxc is licensed under LGPL I don't think there is any need for
an LGPL reimplementation under tools/ocaml.

For the most part the conversion to the up-to-date libxc API (xc_lib.c
essentially implemented the same interface as an older libxc) was pretty
automatic. There are some functions which appear to no longer exist in libxc
which I therefore simply removed the bindings for and a small number of
interfaces which had changed.

Many of the functions bound by the stubs have no in-tree users (which I think
is fine for a language binding) so I have no way to confirm correctness other
than by eye. I was however able to confirm that oxenstored still worked and to
build a XCP toolstack which could successfully start a PV guest.

Uses the new XC_OPENFLAG_NON_REENTRANT option to avoid potential conflicts
between pthreads and the ocaml runtime.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Vincent Hanquez <Vincent.Hanquez@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoocaml: rename Evtchn.bind_virq as Evtchn.bind_dom_exc_virq
Ian Campbell [Thu, 6 Jan 2011 17:34:46 +0000 (17:34 +0000)]
ocaml: rename Evtchn.bind_virq as Evtchn.bind_dom_exc_virq

Rename Evtchn.bind_virq as Evtchn.bind_dom_exc_virq
to reflect its actual behaviour.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoocaml: add dependency to module metadata
Ian Campbell [Thu, 6 Jan 2011 17:33:39 +0000 (17:33 +0000)]
ocaml: add dependency to module metadata

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoocaml: resynchronise uuid library with xen-api-libs.hg
Ian Campbell [Thu, 6 Jan 2011 17:33:00 +0000 (17:33 +0000)]
ocaml: resynchronise uuid library with xen-api-libs.hg

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Correct paths in libxl__fill_dom0_memory_info function
Eamon Walsh [Thu, 6 Jan 2011 17:28:13 +0000 (17:28 +0000)]
libxl: Correct paths in libxl__fill_dom0_memory_info function

Signed-off-by: Eamon Walsh <ewalsh@tycho.nsa.gov>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoMerge
Ian Jackson [Thu, 6 Jan 2011 17:26:53 +0000 (17:26 +0000)]
Merge

14 years agolibxc: portability fixes for NetBSD
Christoph Egger [Thu, 6 Jan 2011 17:26:26 +0000 (17:26 +0000)]
libxc: portability fixes for NetBSD

Attached patch makes libxc build again on NetBSD after the recent rework.

[ Modified by iwj:

I changed the name of the new make variable from LIBDL to DLOPEN_LIBS.
The latter conforms to the naming scheme for similar variables found
in config/*.mk - PTHREAD_LIBS et al.

Also I moved the setting of the variable to -dl from Linux to StdGNU
(which makes it apply more widely) and also added it to SunOS.mk
(based on pure guesswork). ]

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agomem_sharing: fix race condition of nominate and unshare
Tim Deegan [Thu, 6 Jan 2011 16:58:48 +0000 (16:58 +0000)]
mem_sharing: fix race condition of nominate and unshare

(1) When updating/checking p2m type for mem_sharing, we must hold shr_lock
(2) For nominate operation, if the page is already nominated, return the
    handle from page_info->shr_handle
(3) For unshare operation, it is possible that multiple users unshare a
    page via hvm_hap_nested_page_fault() at the same time. If the page
    is already un-shared by someone else, simply return success.

Signed-off-by: Jui-Hao Chiang <juihaochiang@gmail.com>
Signed-off-by: Han-Lin Li <Han-Lin.Li@itri.org.tw>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agolibxl: Specify the target ram size to Qemu (new) when calling it
Anthony PERARD [Thu, 6 Jan 2011 14:27:33 +0000 (14:27 +0000)]
libxl: Specify the target ram size to Qemu (new) when calling it

This patch adds target_ram in device_model_info structure, to be used in
libxl_build_device_model_args_new. Qemu upstream needs to know about it.

It introduces also libxl__sizekb_to_mb to convert size from KB to MB by
rounding up the result.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Implement libxl_basename()
Christoph Egger [Thu, 6 Jan 2011 14:25:10 +0000 (14:25 +0000)]
libxl: Implement libxl_basename()

This patch implements libxl_basename() as a portable replacement
for GNU vs. POSIX basename.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoQEMU_TAG update
Ian Jackson [Wed, 5 Jan 2011 23:54:15 +0000 (23:54 +0000)]
QEMU_TAG update

14 years agolibxl: fix free of uninitialised "disks" variable
Yang Z Zhang [Wed, 5 Jan 2011 23:37:32 +0000 (23:37 +0000)]
libxl: fix free of uninitialised "disks" variable

Reported-by: Wei Huang <wei.huang2@amd.com>
Reported-by: Christoph Egger <Christoph.Egger@amd.com>
Author: Yang Z Zhang <yang.z.zhang@intel.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/xend: drbd: fix things by reverting 20158
Ian Jackson [Wed, 5 Jan 2011 23:31:24 +0000 (23:31 +0000)]
tools/xend: drbd: fix things by reverting 20158

drbd's block-drbd script handles all of the details that c/s 20158
introduces within xend :-(.  This c/s should be reverted as it causes
a regression.  Jim Fehlig tested drbd without 20158 and it works fine.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Jim Fehlig <jfehlig@novell.com>
14 years agoxl: don't segfault parsing disk configs, support NULL physpath and ioemu:
Gianni Tedesco [Wed, 5 Jan 2011 23:13:07 +0000 (23:13 +0000)]
xl: don't segfault parsing disk configs, support NULL physpath and ioemu:

Switch to a state machine parser since it's easier to handle all these
exotic cases without segfaulting. NULL physpaths are now allowed and a
dodgy hack is introduced to skip over the "ioemu:" prefix for a
virtpath. Also fixes a leak of buf2.

Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agorelax vCPU pinned checks
Keir Fraser [Wed, 5 Jan 2011 09:57:15 +0000 (09:57 +0000)]
relax vCPU pinned checks

Both writing of certain MSRs and VCPUOP_get_physid make sense also for
dynamically (perhaps temporarily) pinned vcpus.

Likely a couple of other MSR writes (MSR_K8_HWCR, MSR_AMD64_NB_CFG,
MSR_FAM10H_MMIO_CONF_BASE) would make sense to be restricted by an
is_pinned() check too, possibly also some MSR reads.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86 hvm: Add a missing line to record the type passed into register_io_handler()
Keir Fraser [Wed, 5 Jan 2011 09:52:54 +0000 (09:52 +0000)]
x86 hvm: Add a missing line to record the type passed into register_io_handler()

Add a missing line to record the type passed into
register_io_handler()

Without this line, the BUFFERED_IO handler was never called.

Signed-off-by: Wei Gang <gang.wei@intel.com>
14 years agox86: Allow dom0 to write MSR IA32_ENERGY_PERF_BIAS
Keir Fraser [Wed, 5 Jan 2011 09:52:18 +0000 (09:52 +0000)]
x86: Allow dom0 to write MSR IA32_ENERGY_PERF_BIAS

Allow dom0 to write MSR IA32_ENERGY_PERF_BIAS

There is a new hardware feature, which lets system software to set
Energy Performance Preference. This is a opaque knob in the form of
IA32_ENERGY_PERF_BIAS MSR, which has a 4 bit Energy Performance
Preference Hint.

The support for this feature is indicated by CPUID.06H.ECX.bit3. Refer
to Intel Architectures Software Developer's Manual for more info.

Let dom0 tools to control it.

Signed-off-by: Wei Gang <gang.wei@intel.com>
14 years ago[VTD] added WLAN device ID on Fujitsu's platform in quirks.c
Keir Fraser [Wed, 5 Jan 2011 09:50:21 +0000 (09:50 +0000)]
[VTD] added WLAN device ID on Fujitsu's platform in quirks.c

Added WLAN device ID 0x422C that was found on Fujitsu's Calpella
system to WLAN quirk.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
14 years agox86 amd: Revert 6382:b74c15e4dd4f (AMD flush filter configuration)
Keir Fraser [Wed, 5 Jan 2011 09:48:43 +0000 (09:48 +0000)]
x86 amd: Revert 6382:b74c15e4dd4f (AMD flush filter configuration)

Flush filter is not reliably supported by any processor, we already
have code to unconditionally disable the filter, so we don't need the
command-line config option. Remove it.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoEnable 1GB HAP support by default.
Tim Deegan [Wed, 5 Jan 2011 09:41:28 +0000 (09:41 +0000)]
Enable 1GB HAP support by default.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
14 years agotools/gdbsx: Update gdbsx README
Mukesh Rathor [Tue, 4 Jan 2011 15:40:00 +0000 (15:40 +0000)]
tools/gdbsx: Update gdbsx README

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/hotplug/Linux: only apply dummy MAC address to virtual devices.
Ian Campbell [Tue, 4 Jan 2011 15:26:02 +0000 (15:26 +0000)]
tools/hotplug/Linux: only apply dummy MAC address to virtual devices.

Avoid applying to the bridge and physical network device.

This should un-break dom0 networking in the old xend-creates-bridge
setup (problem introduced in 22493:937488219719).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agox86/mm: Add p2m_lock in set_shared_p2m_entry
Tim Deegan [Tue, 4 Jan 2011 11:32:20 +0000 (11:32 +0000)]
x86/mm: Add p2m_lock in set_shared_p2m_entry

This avoids the immediate problem (calling set_p2m_entry() without the
lock held) but leaves the underlying problem (no consistent locking
order between page-sharing and p2m code) for later.

Signed-off-by: Jui-Hao Chiang <juihaochiang@gmail.com>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agox86 hvm: Missing chunk from TSC-deadline support patch.
Keir Fraser [Mon, 27 Dec 2010 08:00:09 +0000 (08:00 +0000)]
x86 hvm: Missing chunk from TSC-deadline support patch.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agocredit2: Fix x86_32 build.
Keir Fraser [Fri, 24 Dec 2010 10:56:29 +0000 (10:56 +0000)]
credit2: Fix x86_32 build.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoXen MCE test: all test cases
Keir Fraser [Fri, 24 Dec 2010 10:23:08 +0000 (10:23 +0000)]
Xen MCE test: all test cases

Implement the test cases. Each of cases will call the common function,
then call mce inject tool. README for Xen MCE test suite, include the
framwork and test instruction.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Haicheng Li<haicheng.li@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
14 years agoXen MCE test: common functions to be used for test cases
Keir Fraser [Fri, 24 Dec 2010 10:22:23 +0000 (10:22 +0000)]
Xen MCE test: common functions to be used for test cases

Implement some common shell functions and variable definitions are
defined to be used by test cases Verify fuctions include domain0 user
space tool mcelog, Xen dmesg and guest kernel log verification.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Haicheng Li<haicheng.li@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
14 years agoXen MCE test: utilities to inject fake MCE for X86
Keir Fraser [Fri, 24 Dec 2010 10:21:27 +0000 (10:21 +0000)]
Xen MCE test: utilities to inject fake MCE for X86

A software MCE injection tool, which is based on Xen MCE injection
mechanism. It fake MCE error and inject this error to a assigned
Domain Physical Address.  Makefile make sure the tool can be built on
Xen.  A README explain the usage for this tool.

Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com>
Signed-off-by: Haicheng Li<haicheng.li@intel.com>
Signed-off-by: Xudong Hao <xudong.hao@intel.com>
14 years agolibxc: Use .opic to build xenctrl_osdep_ENOSYS.so
Keir Fraser [Fri, 24 Dec 2010 10:17:49 +0000 (10:17 +0000)]
libxc: Use .opic to build xenctrl_osdep_ENOSYS.so

Resolves build error:
    /usr/bin/ld: xenctrl_osdep_ENOSYS.o: relocation R_X86_64_32
    against `a local symbol' can not be used when making a shared
    object; recompile with -fPIC
    xenctrl_osdep_ENOSYS.o: could not read symbols: Bad value
    collect2: ld returned 1 exit status

Clean up object files correctly too.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agoACPI: __init-annotate APEI code
Keir Fraser [Fri, 24 Dec 2010 10:14:58 +0000 (10:14 +0000)]
ACPI: __init-annotate APEI code

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86: a little bit of genapic cleanup
Keir Fraser [Fri, 24 Dec 2010 10:14:36 +0000 (10:14 +0000)]
x86: a little bit of genapic cleanup

Eliminate redundancy among the individual handler functions, and mark
init-only functions as such.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoVT-d: fix and improve print_vtd_entries()
Keir Fraser [Fri, 24 Dec 2010 10:14:01 +0000 (10:14 +0000)]
VT-d: fix and improve print_vtd_entries()

Fix leaking of mapped domain pages (root_entry and ctxt_entry when
falling out of the level traversing loop). Do this by re-arranging
things slightly so that a mapping is retained only as long as it
really is needed.

Fix the failure to use map_domain_page() in the level traversing loop
of the function.

Add a mssing return statement in one of the error paths.

Also I wonder whether not being able to call print_vtd_entries() from
iommu_page_fault_do_one() in ix86 is still correct, now that
map_domain_page() is IRQ safe.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agore-add calls accidentally deleted from run_all_nonirq_keyhandlers()
Keir Fraser [Fri, 24 Dec 2010 10:12:58 +0000 (10:12 +0000)]
re-add calls accidentally deleted from run_all_nonirq_keyhandlers()

c/s 22538:a3a29e67aa7e, having got applied in a form different from
the one submitted, resulted in the calls to
console_{start,end}_log_everything() getting removed without
replacement. Add them back since, other than run_all_keyhandlers(),
this doesn't run with log-everything already in effect.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoUse bool_t for various boolean variables
Keir Fraser [Fri, 24 Dec 2010 10:10:45 +0000 (10:10 +0000)]
Use bool_t for various boolean variables

... decreasing cache footprint. As a prerequisite this requires making
cmdline_parse() a little more flexible.

Also remove a few variables altogether, and adjust sections
annotations for several others.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: link-time .data section adjustments
Keir Fraser [Fri, 24 Dec 2010 08:47:59 +0000 (08:47 +0000)]
x86: link-time .data section adjustments

Fold compiler generated sections (mostly due to -fPIC on x86-64) into
the general .data and .data.read_mostly sections.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86-64: use PC-relative exception table entries
Keir Fraser [Fri, 24 Dec 2010 08:47:23 +0000 (08:47 +0000)]
x86-64: use PC-relative exception table entries

... thus allowing to make the entries half their current size. Rather
than adjusting all instances to the new layout, abstract the
construction the table entries via a macro (paralleling a similar one
in recent Linux).

Also change the name of the section (to allow easier detection of
missed cases) and merge the final resulting output sections into
.data.read_mostly.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agomake sort() generally available
Keir Fraser [Fri, 24 Dec 2010 08:46:46 +0000 (08:46 +0000)]
make sort() generally available

Rather than having this general library function only on ia64, move it
into common code, to be used by x86 exception table sorting too.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoblkif: add placeholder for packet extension to block interface
Keir Fraser [Fri, 24 Dec 2010 08:42:52 +0000 (08:42 +0000)]
blkif: add placeholder for packet extension to block interface

While the corresponding implementation has been in our trees for quite
a while, it's in a state that doesn't make it suitable for submission,
and the original author having left the company leaves open to find
someone to complete this work. Yet to prevent problems with other
interface extensions we'd like to keep the slot in the number space
reserved for the purpose it has been serving here.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86 xsave: supports xsave (CPUID:0xD) enumeration for all sub-leaves.
Keir Fraser [Fri, 24 Dec 2010 08:39:42 +0000 (08:39 +0000)]
x86 xsave: supports xsave (CPUID:0xD) enumeration for all sub-leaves.

In specific, it fixes the following issues:

1. The sub-leaves of CPUID:0x0000000D aren't contiguous. Hypervisor
shouldn't use register values to stop the enumeration. This patch
moves checking on XSAVE sub-leaves out of if-else statement. It also
bumps up sub-leaves to 63.
2. It creates a common function for xsave.
3. The main leaf 0 of CPUID:0x0000000D in current Xen is broken,
especially ECX and EBX registers. This patch cleans it up.
4. It adds support to detects EBX value of CPUID:0x0000000D main leaf
0 on-the-fly.

Signed-off-by: Wei Huang2 <wei.huang2@amd.com>
14 years agox86 xsave: Enable xsave_feature[62] (AMD Lightweight Profiling)
Keir Fraser [Fri, 24 Dec 2010 08:38:22 +0000 (08:38 +0000)]
x86 xsave: Enable xsave_feature[62] (AMD Lightweight Profiling)

The spec of LWP is available at
http://developer.amd.com/cpu/lwp/Pages/default.aspx.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agox86 xsave: Fix 64bit xsave_feature support for set_xcr0().
Keir Fraser [Fri, 24 Dec 2010 08:37:34 +0000 (08:37 +0000)]
x86 xsave: Fix 64bit xsave_feature support for set_xcr0().

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agocredit2: On debug keypress print load average as a fraction
Keir Fraser [Fri, 24 Dec 2010 08:32:43 +0000 (08:32 +0000)]
credit2: On debug keypress print load average as a fraction

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Different unbalance tolerance for underloaded and overloaded queues
Keir Fraser [Fri, 24 Dec 2010 08:32:20 +0000 (08:32 +0000)]
credit2: Different unbalance tolerance for underloaded and overloaded queues

Allow the "unbalance tolerance" -- the amount of difference between
two runqueues that will be allowed before rebalancing -- to differ
depending on how busy the runqueue is.  If it's less than 100%,
default to a difference of 1.0; if it's more than 100%, default to a
tolerance of 0.125.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Introduce a loadavg-based load balancer
Keir Fraser [Fri, 24 Dec 2010 08:31:54 +0000 (08:31 +0000)]
credit2: Introduce a loadavg-based load balancer

This is a first-cut at getting load balancing.  I'm first working on
looking at behavior I want to get correct; then, once I know what kind
of behavior works well, then I'll work on getting it efficient.

The general idea is when balancing runqueues, look for the runqueue
whose loadavg is the most different from ours (higher or lower).
Then, look for a transaction which will bring the loads closest
together: either pushing a vcpu, pulling a vcpu, or swapping them.
Use the per-vcpu load to calculate the expected load after the
exchange.

The current algorithm looks at every combination, which is O(N^2).
That's not going to be suitable for workloads with large numbers of
vcpus (such as highly consolidated VDI deployments).  I'll make a more
efficient algorithm once I've experimented and determined what I think
is the best load-balancing behavior.

At the moment, balance from a runqueue every time the credit resets.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Use loadavg to pick cpus, instead of instantaneous load
Keir Fraser [Fri, 24 Dec 2010 08:31:24 +0000 (08:31 +0000)]
credit2: Use loadavg to pick cpus, instead of instantaneous load

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Migrate request infrastructure
Keir Fraser [Fri, 24 Dec 2010 08:31:04 +0000 (08:31 +0000)]
credit2: Migrate request infrastructure

Put in infrastructure to allow a vcpu to requeset to migrate to a
specific runqueue.  This will allow a load balancer to choose running
VMs to migrate, and know they will go where expected when the VM is
descheduled.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Track expected load
Keir Fraser [Fri, 24 Dec 2010 08:30:42 +0000 (08:30 +0000)]
credit2: Track expected load

As vcpus are migrated, track how we expect the load to change.  This
helps smooth migrations when the balancing doesn't take immediate
effect on the load average.  In theory, if vcpu activity remains
constant, then the measured avgload should converge to the balanced
avgload.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Track average load contributed by a vcpu
Keir Fraser [Fri, 24 Dec 2010 08:30:15 +0000 (08:30 +0000)]
credit2: Track average load contributed by a vcpu

Track the amount of load contributed by a particular vcpu, to help
us make informed decisions about what will happen if we make a move.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agocredit2: Calculate load average
Keir Fraser [Fri, 24 Dec 2010 08:29:53 +0000 (08:29 +0000)]
credit2: Calculate load average

Calculate a per-runqueue decaying load average.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>