IA32_TSC_ADJUST MSR is maintained separately for each logical
processor. A logical processor maintains and uses the IA32_TSC_ADJUST
MSR as follows:
1). On RESET, the value of the IA32_TSC_ADJUST MSR is 0;
2). If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR adds
(or subtracts) value X from the TSC, the logical processor also
adds (or subtracts) value X from the IA32_TSC_ADJUST MSR;
3). If an execution of WRMSR to the IA32_TSC_ADJUST MSR adds (or
subtracts) value X from that MSR, the logical processor also adds
(or subtracts) value X from the TSC.
This patch provides tsc adjust support for hvm guest, with it guest OS
would be happy when sync tsc.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Christoph Egger [Wed, 26 Sep 2012 10:07:42 +0000 (12:07 +0200)]
x86/vMCE: Add AMD support
Add vMCE support for AMD. Add vmce namespace to Intel specific vMCE MSR
functions. Move vMCE prototypes from mce.h to vmce.h.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
- fix inverted return values from vmce_amd_{rd,wr}msr()
- remove bogus printk()-s from those functions
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
This patch provides vMCE save/restore when migration.
1. MCG_CAP is well-defined. However, considering future cap extension,
we keep save/restore logic that Jan implement at c/s 24887;
2. MCi_CTL2 initialized by guestos when booting, so need save/restore
otherwise guest would surprise;
3. Other MSRs do not need save/restore since they are either error-
related and pointless to save/restore, or, unified among all vMCE
platform;
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- fix handling of partial data in XEN_DOMCTL_set_ext_vcpucontext
- fix adjustment of xen_domctl_ext_vcpucontext
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
In our test for win8 guest mce, we find a bug that no matter what
SRAO/SRAR error xen inject to win8 guest, it always reboot.
The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0,
this is not correct for Intel MCE (Under Intel arch, h/w generate MCE#
to all CPUs).
This patch fixes vMCE injection bug, injecting vMCE# to all vcpus on
Intel platforms.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- increase flexibility be making new second argument of inject_vmce() a
VCPU ID rather than just a boolean
Acked-by: Christoph Egger <Christoph.Egger@amd.com> (on just this change)
- fix condition evaluation order in inject_vmce()
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
This patch provides virtual MCE support to guest. It emulates a simple
and clean MCE MSRs interface to guest by faking caps to guest if needed
and masking caps if unnecessary:
1. Providing a well-defined MCG_CAP to guest, filter out un-necessary
caps and provide only guest needed caps;
2. Disabling MCG_CTL to avoid model specific;
3. Sticking all 1's to MCi_CTL to guest to avoid model specific;
4. Enabling CMCI cap but never really inject to guest to prevent
polling periodically;
5. Masking MSCOD field of MCi_STATUS to avoid model specific;
6. Keeping natural semantics by per-vcpu instead of per-domain
variables;
7. Using bank1 and reserving bank0 to work around 'bank0 quirk' of some
very old processors;
8. Cleaning some vMCE# injection logic which shared by Intel and AMD
but useless under new vMCE implement;
9. Keeping compatilbe w/ old xen version which has been backported to
SLES11 SP2, so that old vMCE would not blocked when migrate to new
vMCE;
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
- make printing consistent (and non-exploitable)
- fix return values of intel_mce_{rd,wr}msr() for out of range banks
- miscellaneous cleanup
Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Daniel De Graaf [Wed, 26 Sep 2012 09:56:07 +0000 (11:56 +0200)]
x86: check remote MMIO remap permissions
When a domain is mapping pages from a different pg_owner domain, the
iomem_access checks are currently only applied to the pg_owner domain,
potentially allowing a domain with a more restrictive iomem_access
policy to have the pages mapped into its page tables. To catch this,
also check the owner of the page tables. The current domain does not
need to be checked because the ability to manipulate a domain's page
tables implies full access to the target domain, so checking that
domain's permission is sufficient.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 26 Sep 2012 09:53:38 +0000 (11:53 +0200)]
x86: slightly improve stack trace on debug builds
As was rather obvious from crashes recently happening in stage testing,
the debug hypervisor, in that special case, has a drawback compared to
the non-debug one: When a call through a bad pointer happens, there's
no frame, and the top level (and frequently most important for
analysis) stack entry would get skipped:
Since the bad pointer is being printed anyway (as part of the register
state), replace it with the top of stack value in such a case.
With the introduction of is_active_kernel_text(), use it also at the
(few) other suitable places (I intentionally didn't replace the use in
xen/arch/arm/mm.c - while it would be functionally correct, the
dependency on system_state wouldn't be from an abstract perspective).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 26 Sep 2012 09:52:03 +0000 (11:52 +0200)]
x86: clean up interrupt stub generation
Apart from moving some code that is only used here from the header file
to the actual source one, this also
- moves interrupt[] into .init.data,
- prevents generating (unused) stubs for vectors below
FIRST_DYNAMIC_VECTOR, and
- shortens and sanitizes the names of the stubs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 26 Sep 2012 09:49:56 +0000 (11:49 +0200)]
x86: use compiler visible "add" instead of inline assembly "or" in get_cpu_info()
This follows the same idea as the previous patch, just that the effect
is much more visible here: With a half-way [dr]ecent gcc this reduced
.text size by over 12k for me.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 26 Sep 2012 09:48:21 +0000 (11:48 +0200)]
x86: enhance rsp-relative calculations
The use of "or" in GET_CPUINFO_FIELD so far wasn't ideal, as it doesn't
lend itself to folding this operation with a possibly subsequent one
(e.g. the well known mov+add=lea conversion). Split out the sub-
operations, and shorten assembly code slightly with this.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Ian Jackson [Tue, 25 Sep 2012 17:31:20 +0000 (18:31 +0100)]
docs: network diagrams for the wiki
We provide two new diagrams
docs/figs/network-{bridge,basic}.fig
which are converted to pngs by the Makefiles and intended for
consumption by http://wiki.xen.org/wiki/Xen_Networking.
This is perhaps not the ideal location for this source code but we
don't have a better one.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 25 Sep 2012 12:40:00 +0000 (13:40 +0100)]
tools: bump SONAMEs for changes during 4.2 development cycle.
We mostly did this as we went along, only a couple of minor number
bumps were missed http://marc.info/?l=xen-devel&m=134366054929255&w=2:
- Bumped libxl from 1.0.0 -> 1.0.1
- Bumped libxenstore from 3.0.1 -> 3.0.2
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Tue, 25 Sep 2012 10:03:51 +0000 (11:03 +0100)]
pygrub: always append --args
If a bootloader entry in menu.lst has no additional kernel command line
options listed and the domU.cfg has 'bootargs="--args=something"' the
additional arguments from the config file are not passed to the kernel.
The reason for that incorrect behaviour is that run_grub appends arg
only if the parsed config file has arguments listed.
Fix this by appending args from image section and the config file separatly.
To avoid adding to a NoneType initialize grubcfg['args'] to an empty string.
This does not change behaviour but simplifies the code which appends the
string.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ben Guthro [Tue, 25 Sep 2012 06:38:14 +0000 (08:38 +0200)]
x86/S3: add cache flush on secondary CPUs before going to sleep
Secondary CPUs, between doing their final memory writes (particularly
updating cpu_initialized) and getting a subsequent INIT, may not write
back all modified data. The INIT itself then causes those modifications
to be lost, so in the cpu_initialized case the CPU would find itself
already initialized, (intentionally) entering an infinite loop instead
of actually coming online.
Signed-off-by: Ben Guthro <ben@guthro.net>
Make acpi_dead_idle() call default_dead_idle() rather than duplicating
the logic there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 25 Sep 2012 06:36:33 +0000 (08:36 +0200)]
x86: fix MWAIT-based idle driver for CPUs without ARAT
lapic_timer_{on,off} need to get initialized in this case. This in turn
requires getting HPET broadcast setup to be carried out earlier (and
hence preventing double initialization there).
Jan Beulich [Fri, 21 Sep 2012 15:02:46 +0000 (17:02 +0200)]
x86: enable VIA CPU support
Newer VIA CPUs have both 64-bit and VMX support. Enable them to be
recognized for these purposes, at once stripping off any 32-bit CPU
only bits from the respective CPU support file, and adding 64-bit ones
found in recent Linux.
This particularly implies untying the VMX == Intel assumption in a few
places.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 20 Sep 2012 07:21:53 +0000 (09:21 +0200)]
x86: tighten checks in XEN_DOMCTL_memory_mapping handler
Properly checking the MFN implies knowing the physical address width
supported by the platform, so to obtain this consistently the
respective code gets moved out of the MTRR subdir.
Btw., the model specific workaround in that code is likely unnecessary
- I believe those CPU models don't support 64-bit mode. But I wasn't
able to formally verify this, so I preferred to retain that code for
now.
But domctl code here also was lacking other error checks (as was,
looking at it again from that angle) the XEN_DOMCTL_ioport_mapping one.
Besides adding the missing checks, printing is also added for the case
where revoking access permissions didn't work (as that may have
implications for the host operator, e.g. wanting to not pass through
affected devices to another guest until the one previously using them
did actually die).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 20 Sep 2012 07:20:30 +0000 (09:20 +0200)]
x86/IO-APIC: streamline level ack/end handling
Rather than evaluating "ioapic_ack_new" on each invocation, and
considering that the two methods really have almost no code in common,
split the handlers.
While at it, also move ioapic_ack_{new,forced} into .init.data
(eliminating the single non-__init reference to the former).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 19 Sep 2012 07:27:55 +0000 (09:27 +0200)]
x86: properly check XEN_DOMCTL_ioport_mapping arguments for invalid range
In particular, the case of "np" being a very large value wasn't handled
correctly. The range start checks also were off by one (except that in
practice, when "np" is properly range checked, this would still have
been caught by the range end checks).
Also, is a GFN wrap in XEN_DOMCTL_memory_mapping really okay?
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 19 Sep 2012 07:26:26 +0000 (09:26 +0200)]
x86/ACPI: fix error indication from acpi_parse_madt_lapic_entries()
If the legacy APIC invocation of acpi_table_parse_madt() succeeds but
the x2APIC counterpart fails, this is regarded as failure by the
function, yet its return value would indicate success.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Mon, 17 Sep 2012 20:12:21 +0000 (21:12 +0100)]
xsm/flask: add domain relabel support
This adds the ability to change a domain's XSM label after creation.
The new label will be used for all future access checks; however,
existing event channels and memory mappings will remain valid even if
their creation would be denied by the new label.
With appropriate security policy and hooks in the domain builder, this
can be used to create domains that the domain builder does not have
access to after building. It can also be used to allow a domain to
drop privileges - for example, prior to launching a user-supplied
kernel loaded by a pv-grub stubdom.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Mon, 17 Sep 2012 20:10:39 +0000 (21:10 +0100)]
xsm/flask: remove unneeded create_sid field
This field was only used to populate the ssid of dom0, which can be
handled explicitly in the domain creation hook. This also removes the
unnecessary permission check on the creation of dom0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Mon, 17 Sep 2012 20:10:07 +0000 (21:10 +0100)]
xsm/flask: remove inherited class attributes
The ability to declare common permission blocks shared across multiple
classes is not currently used in Xen. Currently, support for this
feature is broken in the header generation scripts, and it is not
expected that this feature will be used in the future, so remove the
dead code.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Committed-by: Keir Fraser <keir@xen.org>
Jiongxi Li [Mon, 17 Sep 2012 20:06:02 +0000 (21:06 +0100)]
xen: add virtual x2apic support for apicv
basically to benefit from apicv, we need clear MSR bitmap for
corresponding x2apic MSRs:
0x800 - 0x8ff: no read intercept for apicv register virtualization
TPR,EOI,SELF-IPI: no write intercept for virtual interrupt
delivery
Signed-off-by: Jiongxi Li <jiongxi.li@intel.com> Committed-by: Keir Fraser <keir@xen.org>
Jiongxi Li [Mon, 17 Sep 2012 20:05:11 +0000 (21:05 +0100)]
xen: enable Virtual-interrupt delivery
Virtual interrupt delivery avoids Xen to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:
For pending interrupt from vLAPIC, instead of direct injection, we may
need update architecture specific indicators before resuming to guest.
Before returning to guest, RVI should be updated if any pending IRRs
EOI exit bitmap controls whether an EOI write should cause VM-Exit. If
set, a trap-like induced EOI VM-Exit is triggered. The approach here
is to manipulate EOI exit bitmap based on value of TMR. Level
triggered irq requires a hook in vLAPIC EOI write, so that vIOAPIC EOI
is triggered and emulated
Signed-off-by: Gang Wei <gang.wei@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Jiongxi Li <jiongxi.li@intel.com> Committed-by: Keir Fraser <keir@xen.org>
Christoph Egger [Mon, 17 Sep 2012 16:57:24 +0000 (17:57 +0100)]
MCE: use new common mce handler on AMD CPUs
Factor common machine check handler out of intel specific code
and move it into common files.
Replace old common mce handler with new one and use it on AMD CPUs.
No functional changes on Intel side.
While here fix some whitespace nits and comments.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Committed-by: Keir Fraser <keir@xen.org>
This is a patch repairing a regression in code previously functional
in 4.1.x. It appears that, during some refactoring work, calls to
hvm_memory_event_cr3 and hvm_memory_event_cr4 were lost.
These functions were originally called in mov_to_cr() of vmx.c, but
the commit http://xenbits.xen.org/hg/xen-unstable.hg/rev/1276926e3795
abstracted the original code into generic functions up a level in
hvm.c, dropping these calls in the process.
Signed-off-by: Steven Maresca <steve@zentific.com> Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Tim Deegan <tim@xen.org> Committed-by: Keir Fraser <keir@xen.org>
Fix libxenstore memory leak when USE_PTHREAD is not defined
Redefine usage of pthread_cleanup_push and _pop, to explicitly call free for
heap objects in error paths.
By the way, set a suitable errno value for an error path that had none.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 17 Sep 2012 10:17:02 +0000 (11:17 +0100)]
xl: Remove global domid and enable -Wshadow
Lots of functions loop over a list of domain and others take a domid as
a parameter, shadowing the global one and leading to all sorts of
confusion.
Therefore remove the global domid and explicitly pass it around as
necessary.
Adds a domid to the parameters for many functions and switches many
others from taking a char * domain specifier to taking a domid, pushing
the domid lookup to the toplevel.
Replaces some open-coded domain_qualifier_to_domid error checking with
find_domain.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- annotate find_domain() with warn_unused_result and fix the
handful of errors. ] Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 17 Sep 2012 10:17:01 +0000 (11:17 +0100)]
xl: prepare to enable Wshadow
Takes care of everything other than the global domid clashes.
Avoid galobal functions
- stime(2)
- time(2)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 17 Sep 2012 10:17:00 +0000 (11:17 +0100)]
libxl: Enable -Wshadow.
It was convenient to invent $(CFLAGS_LIBXL) to do this.
Various renamings to avoid shadowing standard functions:
- index(3)
- listen(2)
- link(2)
- abort(3)
- abs(3)
Reduced the scope of some variables to avoid conflicts.
Change to libxc is due to the nested hypercall buf macros in
set_xen_guest_handle (used in libxl) using the same local private vars.
Build tested only.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Mon, 17 Sep 2012 10:16:59 +0000 (11:16 +0100)]
xl: free libxl context, logger and lockfile using atexit handler
xl frequently just calls exit(3), especially on error. Try to clean
up some of our global state to make tools like valgrind more useful.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Mon, 17 Sep 2012 08:09:59 +0000 (10:09 +0200)]
xenpm: make argument parsing and error handling more consistent
Specifically, what values are or aren't accepted as CPU identifier, and
how the values get interpreted should be consistent across sub-commands
(intended behavior now: non-negative values are okay, and along with
omitting the argument, specifying "all" will also be accepted).
For error handling, error messages should get consistently issued to
stderr, and the tool should now (hopefully) produce an exit code of
zero only in the (partial) success case (there may still be a small
number of questionable cases).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 14 Sep 2012 12:17:26 +0000 (14:17 +0200)]
amd iommu: use base platform MSI implementation
Given that here, other than for VT-d, the MSI interface gets surfaced
through a normal PCI device, the code should use as much as possible of
the "normal" MSI support code.
Further, the code can (and should) follow the "normal" MSI code in
distinguishing the maskable and non-maskable cases at the IRQ
controller level rather than checking the respective flag in the
individual actors.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Wang <wei.wang2@amd.com> Acked-by: Keir Fraser <keir@xen.org>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 14 Sep 2012 09:02:52 +0000 (10:02 +0100)]
libxl: Fix missing dependency in api check rule
Without this, the api check cpp run might happen before the various
autogenerated files which are #include by libxl.h are ready.
We need to remove the api-ok file from AUTOINCS to avoid a circular
dependency. Instead, we list it explicitly as a dependency of the
object files. The result is that the api check is the last thing to
be done before make considers the preparation done and can start work
on compiling .c files into .o's.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Matt Wilson [Fri, 14 Sep 2012 09:02:51 +0000 (10:02 +0100)]
docs: flesh out xl.cfg documentation, correct typos, reorganize
Some highlights:
* Correct some markup errors:
Around line 663:
'=item' outside of any '=over'
Around line 671:
You forgot a '=back' before '=head3'
* Add documentation for msitranslate, power_mgnt, acpi_s3, aspi_s4,
gfx_passthru, nomigrate, etc.
* Reorganize items in "unclassified" sections like cpuid,
gfx_passthru to where they belong
* Correct link L<> references so they can be resolved within the
document
* Remove placeholders for deprecated options device_model and vif2
* Remove placeholder for "sched" and "node", as these are options for
cpupool configuration. Perhaps cpupool configuration deserves
a section in this document.
* Rename "global" options to "general"
* Add section headers to group general VM options.
Signed-off-by: Matt Wilson <msw@amazon.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Jason McCarver [Fri, 14 Sep 2012 09:02:51 +0000 (10:02 +0100)]
xentop.c: Change curses painting behavior to avoid flicker
Currently, xentop calls clear() before drawing the screen and calling
refresh(). This causes the entire screen to be repainted from scratch
on each call to refresh(). It is inefficient and causes visible flicker
when using xentop.
This patch fixes this by calling erase() instead of clear() which overwrites
the current screen with blanks instead. The screen is then drawn as usual
in the top() function and refresh() is called. This method allows curses
to only repaint the characters that have changed since the last call
to refresh(), thus avoiding the flicker and sending fewer characters to
the terminal.
In the event the screen becomes corrupted, this patch accepts a CTRL-L
keystroke from the user which will call clear() and force a repaint of
the entire screen.
Signed-off-by: Jason McCarver <slam@parasite.cc> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 14 Sep 2012 09:02:50 +0000 (10:02 +0100)]
xl: do not leak cpupool names.
Valgrind reports:
==3076== 7 bytes in 1 blocks are definitely lost in loss record 1 of 1
==3076== at 0x402458C: malloc (vg_replace_malloc.c:270)
==3076== by 0x406F86D: libxl_cpupoolid_to_name (libxl_utils.c:102)
==3076== by 0x8058742: parse_config_data (xl_cmdimpl.c:639)
==3076== by 0x805BD56: create_domain (xl_cmdimpl.c:1838)
==3076== by 0x805DAED: main_create (xl_cmdimpl.c:3903)
==3076== by 0x804D39D: main (xl.c:285)
And indeed there are several places where xl uses
libxl_cpupoolid_to_name as a boolean to test if the pool name is
valid and leaks the name if it is. Introduce an is_valid helper and
use that instead.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Juergen Gross<juergen.gross@ts.fujitsu.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
xl: error if vif backend!=0 is used with run_hotplug_scripts
Print an error and exit if backend!=0 is used in conjunction with
run_hotplug_scripts. Currently libxl can only execute hotplug scripts
from the toolstack domain (the same domain xl is running from).
Added a description and workaround of this issue on
xl-network-configuration.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
libxl: fix usage of backend parameter and run_hotplug_scripts
vif interfaces allows the user to specify the domain that should run
the backend (also known as driver domain) using the 'backend'
parameter. This is not compatible with run_hotplug_scripts=1, since
libxl can only run the hotplug scripts from the Domain 0.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
CentOS 5.x forked e2fs ext4 support into a different package called
e4fs, and so headers and library names changed from ext2fs to ext4fs.
Check if ext4fs/ext2fs.h and -lext4fs work, and use that instead of
ext2fs to build libfsimage. This patch assumes that if the ext4fs
library is present it should always be used instead of ext2fs.
This patch includes a rework of the ext2fs check, a new ext4fs check
and a minor modification in libfsimage to use the correct library.
Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Eat the ENOSYS and turn it into 0. Log and propagate other errors.
I don't have a 32 bit system handy, so tested on x86_64 with a libxc
hacked to return -ENOSYS and -EINVAL.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
Tim Deegan [Thu, 13 Sep 2012 15:41:33 +0000 (16:41 +0100)]
x86/mm: remove the linear mapping of the p2m tables.
Mapping the p2m into the monitor tables was an important optimization
on 32-bit builds, where it avoided mapping and unmapping p2m pages
during a walk. On 64-bit it makes no difference -- see
http://old-list-archives.xen.org/archives/html/xen-devel/2010-04/msg00981.html
Get rid of it, and use the explicit walk for all lookups.
Signed-off-by: Tim Deegan <tim@xen.org> Committed-by: Tim Deegan <tim@xen.org>
Not quite all, but a great deal was to specifically allow ia64 support
to be retrofitted to x86 platform code. Since we no longer support
ia64 we can happily remove the ifdefs. Any new platform which wanted
to share this code would likely need a different set of ifdefs in any
case, making it a brand new porting effort.
Andrew Cooper [Wed, 12 Sep 2012 18:31:16 +0000 (19:31 +0100)]
x86/passthrough: Fix corruption caused by race conditions between
device allocation and deallocation to a domain.
A toolstack, when dealing with a domain using PCIPassthrough, could
reasonably be expected to issue DOMCTL_deassign_device hypercalls to
remove all passed through devices before issuing a
DOMCTL_destroydomain hypercall to kill the domain. In the case where
a toolstack is perhaps less sensible in this regard, the hypervisor
should not fall over.
In domain_kill(), pci_release_devices() searches the alldevs_list list
looking for PCI devices still assigned to the domain. If the
toolstack has correctly deassigned all devices before killing the
domain, this loop does nothing.
However, if there are still devices attached to the domain, the loop
will call pci_cleanup_msi() without unbinding the pirq from the
domain. This eventually calls destroy_irq() which xfree()'s the
action.
However, as the irq_desc->action pointer is abused in an unsafe
matter, without unbinding first (which at least correctly cleans up),
the action is actually an irq_guest_action_t* rather than an
irqaction*, meaning that the cpu_eoi_map is leaked, and eoi_timer is
free()'d while still being on a pcpu's inactive_timer list. As a
result, when this free()'d memory gets reused, the inactive_timer list
becomes corrupt, and list_*** operations will corrupt hypervisor
memory.
If the above were not bad enough, the loop in pci_release_devices()
still leaves references to the irq it destroyed in
domain->arch.pirq_irq and irq_pirq, meaning that a later loop,
free_domain_pirqs(), which happens as a result of
complete_domain_destroy() will unbind and destroy all irqs which were
still bound to the domain, resulting in a double destroy of any irq
which was still bound to the domain at the point at which the
DOMCTL_destroydomain hypercall happened.
Because of the allocation of irqs from find_unassigned_irq(), the
lowest free irq number is going to be handed back from create_irq().
There is a further race condition between the original (incorrect)
call to destroy_irq() from pci_release_devices(), and the later call
to free_domain_pirqs() (which happens in a softirq context at some
point after the domain has officially died) during which the same irq
number (which is still referenced in a stale way in
domain->arch.pirq_irq and irq_pirq) has been allocated to a new domain
via a PHYSDEVOP_map_pirq hypercall (Say perhaps in the case of
rebooting a domain).
In this case, the cleanup for the dead domain will free the recently
bound irq under the feet of the new domain. Furthermore, after the
irq has been incorrectly destroyed, the same domain with another
PHYSDEVOP_map_pirq hypercall can be allocated the same irq number as
before, leading to an error along the lines of:
../physdev.c:188: dom54: -1:-1 already mapped to 74
In this case, the pirq_irq and irq_pirq mappings get updated to the
new PCI device from the latter PHYSDEVOP_map_pirq hypercall, and the
IOMMU interrupt remapping registers get updated, leading to IOMMU
Primary Pending Fault due to source-id verification failure for
incoming interrupts from the passed through device.
The easy fix is to simply deassign the device in pci_release_devices()
and leave all the real cleanup to the free_domain_pirqs() which
correctly unbinds and destroys the irq without leaving stale
references around.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Committed-by: Keir Fraser <keir@xen.org>
Ian Campbell [Wed, 12 Sep 2012 16:55:27 +0000 (17:55 +0100)]
tools: drop ia64 support
Removed support from libxc and mini-os.
This also took me under xen/include/public via various symlinks.
Dropped tools/debugger/xenitp entirely, it was described upon commit
as:
"Xenitp is a low-level debugger for ia64" and doesn't appear to be
linked into the build anywhere.
Jan Beulich [Wed, 12 Sep 2012 08:21:21 +0000 (10:21 +0200)]
gnttab: cleanup of number-of-active-frames calculations
max_nr_active_grant_frames() is merly is special case of
num_act_frames_from_sha_frames(), so there's no need to have a special
case implementation for it.
Further, some of the related definitions (including the "struct
active_grant_entry" definition itself) can (and hence should) really be
private to grant_table.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 12 Sep 2012 08:20:18 +0000 (10:20 +0200)]
x86: use only a single branch for upcall-pending exit path checks
This utilizes the fact that the two bytes of interest are adjacent to
one another and that the resulting 16-bit values of interest are within
a contiguous range of numbers.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 12 Sep 2012 08:19:34 +0000 (10:19 +0200)]
x86-64/EFI: allow chaining of config files
Namely when making use the CONFIG_XEN_COMPAT_* options in the legacy
Linux kernels, newer kernels may not be compatible with older
hypervisors, so trying to boot such a combination makes little sense.
Booting older kernels on newer hypervisors, however, has to always
work.
With the way xen.efi looks for its configuration file, allowing
individual configuration files to refer only to compatible kernels,
and referring from an older- to a newer-hypervisor one (the kernels
of which will, as said, necessarily be compatible with the older
hypervisor) allows to greatly reduce redundancy at least in
development environments where one frequently wants multiple
hypervisors and kernles to be installed in parallel.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 11 Sep 2012 13:55:33 +0000 (15:55 +0200)]
ns16550: PCI initialization adjustments
Besides single-port serial cards, also accept multi-port ones and such
providing mixed functionality (e.g. also having a parallel port).
Reading PCI_INTERRUPT_PIN before ACPI gets enabled generally produces
an incorrect IRQ (below 16, whereas after enabling ACPI it frequently
would end up at a higher one), so this is useful (almost) only when a
system already boots in ACPI mode.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 11 Sep 2012 13:51:52 +0000 (15:51 +0200)]
serial: avoid fully initializing unused consoles
Defer calling the drivers' post-IRQ initialization functions (generally
doing allocation of transmit buffers) until it is known that the
respective console is actually going to be used.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 11 Sep 2012 13:47:16 +0000 (15:47 +0200)]
console: prepare for non-COMn port support
Widen SERHND_IDX (and use it where needed), introduce a flush low level
driver method, and remove unnecessary peeking of the common code at the
(driver specific) serial port identification string in the "console="
command line option value.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 11 Sep 2012 13:45:20 +0000 (15:45 +0200)]
x86: allow early use of fixmaps
As a prerequisite for adding an EHCI debug port based console
implementation, set up the page tables needed for (a sub-portion of)
the fixmaps together with other boot time page table construction.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Dan Magenheimer [Tue, 11 Sep 2012 12:19:03 +0000 (14:19 +0200)]
tmem: fixup 2010 cleanup patch that breaks tmem save/restore
20918:a3fa6d444b25 "Fix domain reference leaks" (in Feb 2010, by Jan)
does some cleanup in addition to the leak fixes. Unfortunately, that
cleanup inadvertently resulted in an incorrect fallthrough in a switch
statement which breaks tmem save/restore.
That broken patch was apparently applied to 4.0-testing and 4.1-testing
so those are broken as well.
What is the process now for requesting back-patches to 4.0 and 4.1?
(Side note: This does not by itself entirely fix save/restore in 4.2.)
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Committed-by: Jan Beulich <jbeulich@suse.com>