Ian Jackson [Fri, 25 Jun 2010 14:43:50 +0000 (15:43 +0100)]
libxc: Fix ia64 build for interface change
This patch fixes ia64 by the following method:
- rename xc_handle xch
- rename guest_xc xch
- add xc_interface *xch to arguments of some functions
- replace xc_dom_printf with macros
- Add *xch argument to corresponding x86 functions [iwj]
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 24 Jun 2010 11:45:32 +0000 (12:45 +0100)]
ocaml: remove bogus /dev/xen/ev[en]tchn
Oxenstored should not try to create the evtchn device, as it:
* creates the wrong name (/dev/xen/eventchn rather than evtchn)
* uses a hard-coded minor number, even though this dynamically
depends on what other misc devices are in the kernel
Remove all this code and just rely on the system to create the device.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Jackson [Wed, 23 Jun 2010 16:05:31 +0000 (17:05 +0100)]
libxl: support reset file on sysfs
Recent kernels have a reset file on sysfs per PCI device, to allow PCI
device reset from userspace.
This patch adds support to libxl for resetting PCI devices using the
reset file on sysfs, in case the do_flr file is not preset.
Ian Jackson [Wed, 23 Jun 2010 16:04:26 +0000 (17:04 +0100)]
libxl: make libxl_wait_for_device_model clearer
at the moment libxl_wait_for_device_model waits on a xenstore watch
before checking the current value of the xenstore node, that might
contain already the value the function was looking for.
This patch changes libxl_wait_for_device_model so that it checks the
value of the xenstore node first, then waits for the watch.
xenstore watch automatically fire one time when you install them for
this exact same purpose, so the previous code is not wrong, but this
version is clearer.
Ian Jackson [Tue, 22 Jun 2010 15:37:53 +0000 (16:37 +0100)]
python/xc: [PATCH 3/3] add flask capabilities in python xc bindings
The flask library is small, and putting everything in libxenctrl make
relying on flask functionalities in libxl easier.
libflask is left for compatibility purpose, but should be considered
deprecated, and remove in the near future. all flask_ symbols are now
xc_flask_ symbols in libxenctrl.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Ian Jackson [Tue, 22 Jun 2010 15:36:04 +0000 (16:36 +0100)]
libxc: [PATCH 1/3] merge libflask into libxenctrl
The flask library is small, and putting everything in libxenctrl make
relying on flask functionalities in libxl easier.
libflask is left for compatibility purpose, but should be considered
deprecated, and remove in the near future. all flask_ symbols are now
xc_flask_ symbols in libxenctrl.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Fix the xencommons init script chkconfig configuration since the
priority was missing here and chkconfig was complaining about invalid
chkconfig script so I added both start and stop priorities to the
chkconfig line and it was working fine, the script was successfully
added to chkconfig when using `chkconfig --add xencommons` and
management using the chkconfig utility was now possible.
Tested on RHEL-5 Server with Xen-4.1-unstable installed, running on
PVops kernel 2.6.32.15 and it was working fine.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Ian Jackson [Tue, 22 Jun 2010 15:07:00 +0000 (16:07 +0100)]
Check "mac" address sooner in device_create function, before doing device_add.
In XendDomainInfo.py device_create function, when device type is
"vif", it has a paragraph to check the validity of "mac"
address. Before checking validity, device_add has been done. But after
checking validity, if the mac address is invlid, it raises VmError and
exits directly without doing clean work like removing the device item
from config info. This will cause that the incorrect mac address is
saved into VM Config file and VM fails to restart. If check "mac"
validity before doing device_add, there will be no problem.
this patch implements few missing options in xl so that it can be
used as a replacement of xm in xendomains:
- dryrun and quiet, long options to xl create;
- l, option to xl list.
printf_info is now used to print the configuration of the running VMs so
the output has been reformatted to be similar to the output of xm list -l.
There is still one command used in xendomains that is not implemented in
xl and not covered by this patch: xm shutdown. However a patch has been
sent to the list in the past and we are expecting a new version of it
soon.
Keir Fraser [Mon, 21 Jun 2010 18:19:25 +0000 (19:19 +0100)]
Enable tmem functionality for PV on HVM guests. Guest kernel
must still be tmem-enabled to use this functionality (e.g.
won't work for Windows), but upstream Linux tmem (aka
cleancache and frontswap) patches apply cleanly on top
of PV on HVM patches.
Also, fix up some ASSERTS and code used only when bad guest
mfns are passed to tmem. Previous code could crash Xen
if a buggy/malicious guest passes bad gmfns.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Mon, 21 Jun 2010 18:18:27 +0000 (19:18 +0100)]
x86 hvm: implement HVMOP_pagetable_dying
This patch implements HVMOP_pagetable_dying: an hypercall for
guests to notify Xen that a pagetable is about to be destroyed so that
Xen can use it as a hint to unshadow the pagetable soon and unhook the
top-level user-mode shadow entries right away.
Gianluca Guida is the original author of this patch.
Keir Fraser [Mon, 21 Jun 2010 08:59:10 +0000 (09:59 +0100)]
vmx: Fix bug in VMX VPMU fixed function PMC offset
This is a minor fix to the calculation of bit-width of fixed function
perfmon counters in Intel processors. Bits 5-12 of edx register
should be calculated as (edx & 0x1fe0) >>5 instead of using 0x1f70.
Keir Fraser [Mon, 21 Jun 2010 08:58:17 +0000 (09:58 +0100)]
xend: fix "xm list hangs"
If a command hold domains_lock, "xm list" would hang for waiting for
the lock. Such as creating many VMs at a script (such as 20), command
of "xm list" could hang for long time(10 mins). I think domains_lock
here only protect update(). So, we shouldn't do update before command
of "list" really get this lock, but xm do need show the domain's
information quickly. In this patch, if command couldn't get the
domains_lock after 20 times trying, "xm list" would show the
information of VMs without update().
Signed-off-by: James Song (Wei) <jsong@novell.com>
Keir Fraser [Fri, 18 Jun 2010 13:08:57 +0000 (14:08 +0100)]
ia64: Fix xc_save error reporting
This is the patch for error reporting on ia64 that has a special
handling in comparison with i386/x86_64 platforms. This is pretty
straight-forward just to fail on "cannot map mfn page" message instead
of continue since the memory is not being correctly mapped using the
xc_map_foreign_range() function.
From: Michal Novotny <minovotn@redhat.com> Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 17 Jun 2010 07:52:50 +0000 (08:52 +0100)]
cpuidle: redirect some hpet lock users to a new cpumask_lock
The hpet channel lock was also used for prevent handle_hpet_broadcast
access other cpu's timer_deadline_start/end after other cpu was
already waken up.
This purpose can be approached via a standalone lock to remove much
spins on the hpet channel lock.
Keir Fraser [Thu, 17 Jun 2010 07:52:29 +0000 (08:52 +0100)]
cpuidle: remove hpet access in hpet_broadcast_exit
hpet_broadcast_exit calls reprogram_hpet to stop possible hpet intr if
the last deep-cstate waken up cpu is waken by unexpected intrs instead
of hpet broadcast handler. This can be removed without brings much
useless intrs, but bring chance for further optimization. It is a
tradeoff between grace & optimization.
BTW, move the cpumask set out of critial section in
hpet_broadcast_enter to shorten it.
Keir Fraser [Thu, 17 Jun 2010 07:51:25 +0000 (08:51 +0100)]
cpuidle: use stime to count c-state residency in NONSTOP_TSC case
stime is based on tsc, with far less access cost than PM timer. So for
processors w/ NONSTOP_TSC, using stime instead of PM timer. This could
reduce idle overheads and save power.
Keir Fraser [Thu, 17 Jun 2010 06:22:06 +0000 (07:22 +0100)]
x86: IRQ affinity should track vCPU affinity
With IRQs getting bound to the CPU the binding vCPU currently runs on
there can result quite a bit of extra cross CPU traffic as soon as
that vCPU moves to a different pCPU. Likewise, when a domain re-binds
an event channel associated with a pIRQ, that IRQ's affinity should
also be adjusted.
The open issue is how to break ties for interrupts shared by multiple
domains - currently, the last request (at any point in time) is being
honored.
Keir Fraser [Tue, 15 Jun 2010 12:21:34 +0000 (13:21 +0100)]
x86/mce: assorted fixes
- correct various range checks (avoids bogus warnings on domains
modifying virtualized MSRs)
- correct consistency check (so that APs get checked instead of the
BP [against uninitialized data])
- reduce verbosity (capabilities printed only once, but then all of
the relevant values)
Keir Fraser [Tue, 15 Jun 2010 12:21:03 +0000 (13:21 +0100)]
x86: return value of domain_pirq_to_irq() is signed
That value can, for forcibly unbound PIRQs, validly be negative, and
for the respective check to catch those cases (and prevent using these
negative values as array index), the respective variables must be of
signed type.
Keir Fraser [Tue, 15 Jun 2010 12:20:43 +0000 (13:20 +0100)]
x86: put_superpage() must also work for !opt_allow_superpage
This is because the P2M table, when placed at a kernel specified
location, gets populated with large pages, which the domain must have
a way to unmap/recycle.
Additionally when allowing Dom0 to use superpages, they ought to be
tracked accordingly in the superpage frame table.
Keir Fraser [Tue, 15 Jun 2010 12:19:33 +0000 (13:19 +0100)]
x86: allow LZO compressed bzImage to be used as Dom0 kernel
... since recently Linux added this as another kernel compression
method, and we already have LZO compression in the tree (from tmem),
so that only glue logic is needed.
Keir Fraser [Tue, 15 Jun 2010 12:18:55 +0000 (13:18 +0100)]
x86: fix pv cpuid masking
Invert initial values of the variables parsed into from the command
line, so that completely clearing out one or more of the four bit
fields is possible.
Further, consolidate the command line parameter specifications into
a single place.
Finally, as per "Intel Virtualization Technology FlexMigration
Application Note" (http://www.intel.com/Assets/PDF/manual/323850.pdf),
also handle family 6 model 0x1f.
What remains open is the question whether pv_cpuid() shouldn't also
consume these masks.
Keir Fraser [Tue, 15 Jun 2010 12:18:09 +0000 (13:18 +0100)]
Don't save Xen heap pages during domain save
As discussed in the thread starting at
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg01383.html,
don't save Xen heap pages in order to avoid overallocation when the
domain gets restored, as those pages would get (temporarily) backed
with normal RAM pages by the restore code.
This requires making DOMCTL_getpageframeinfo{2,3} usable for HVM
guests, meaning that the input to these must be treated as GMFNs.
Keir Fraser [Tue, 15 Jun 2010 10:36:27 +0000 (11:36 +0100)]
xend: Do not mess with bridge if admin has set one up already
Previously, the default "network-script",
/etc/xen/scripts/network-bridge, would attempt to do its horrid work
even if you had already set everything up in /etc/network/interfaces.
Setting up your bridge in /etc/network/interfaces is:
* easy
* required for libxl since libxl never does it for you
* not a fragile piece of lunacy
* properly documented
* the way everyone would expect it to work
In this small patch we make it so that the default config for xend
doesn't mess about on startup if you already have a bridge, and
doesn't mess about on shutdown unless your first-named bridge (eth0 or
xenbr0, normally) doesn't also have a physical interface named
p<whatever> (peth0 or pxenbr0) enslaved to it. The latter test is not
ideal but will hopefully do from now until the time xend finally dies.
We also fix the "documentation" - ie, the comments in the default
xend-config.sxp - to correspond to reality.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Keir Fraser [Tue, 15 Jun 2010 10:34:13 +0000 (11:34 +0100)]
mce: Enhance the vmce injection check logic
Currently we will not inject vMCE if guest has different mca control
register setup.
This is not enough, we need consider more. If guest has different
family/model, we should not inject guest, because the MCA error code
include model specific information. If guest has not enabled MCE
(i.e. CR4.X86_CR4_MCE is clear), we should not inject vMCE.
One thing need notice. In the memory error handler, we didn't kill the
guest if vMCE is not ready, instead, we will simply ignore the
vMCE. In native, system will reboot if MCE in CR4 is not enabled. We
need contain guest access to the broken memory through eithe software
or hardware method.
Keir Fraser [Thu, 10 Jun 2010 21:39:52 +0000 (22:39 +0100)]
tmem: Fix domain lifecycle synchronisation.
Obtaining a domain reference count is neither necessary nor
sufficient. Instead we simply check whether a domain is already dying
when it first becomes a client of tmem. If it is not then we will
correctly clean up later via tmem_destroy() called from domain_kill().
Keir Fraser [Thu, 10 Jun 2010 21:12:36 +0000 (22:12 +0100)]
Tmem: fix domain refcount leak by returning to simpler model
which claims a ref once when the tmem client is first associated
with the domain, and puts it once when the tmem client is
destroyed.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
This disables superpage restore support, but should gain us acceptable
performance when restoring a domain using a pv_ops dom0 kernel. This
is because single-page allocations will be batched rather than issued
singly.
Keir Fraser [Thu, 10 Jun 2010 07:19:11 +0000 (08:19 +0100)]
x86 mce: Clean-up the mc_panic handler.
Firstly, mc_panic should only mc_panic in one CPU to avoid printk
output be mixed.
Secondly, call urgent handler in MCE panic to get error code specific
hander be involved.
Keir Fraser [Thu, 10 Jun 2010 07:18:46 +0000 (08:18 +0100)]
x86 mce: Clean Intel's MCE handler code
Add intel_mce_type check according to Intel's SDM.
Reduce intel_memerr_dhandler()'s indent to make code easily read. And
add a page_off action when we offline the page, so that dom0 can knows
about the action taken by xen hypervisor.
Add a default delay mce handler, which will crash if unknow SRAR error
or fatal error, otherwise, system continue.
Keir Fraser [Thu, 10 Jun 2010 07:18:11 +0000 (08:18 +0100)]
x86 mce: Make mce_action action be usable for both delayed handler and
urgent handler
Originally mce_action is called for delayed hander. Change it to be
used for both delayed handler and urgent handler.Wrap it with
mce_delayed_action for delay handler.
Change the return value to be more clearly.
Change the mca handler from mca_code to a function to be more
flexible. And change the interface to mce_handler to be mca_binfo to
pass more information.
Keir Fraser [Thu, 10 Jun 2010 07:17:38 +0000 (08:17 +0100)]
x86 mce: Clean-up intel mcheck_init
Cleanup intel_mcheck_init, also change the MCA capability check. We
will always use BSP's MCA capability as global value. If there are
some difference between BSP/AP, we will print warning.