view docs/src/user.tex @ 15867:aaae02dbe269

x86: Handle 'self-IPI' on legacy UP systems with no APIC.
Signed-off-by: Keir Fraser <keir@xensource.com>
author kfraser@localhost.localdomain
date Mon Sep 10 17:49:58 2007 +0100 (2007-09-10)
parents 07688f8f5394
children a00cc97b392a
line source
1 \documentclass[11pt,twoside,final,openright]{report}
2 \usepackage{a4,graphicx,html,parskip,setspace,times,xspace,url}
3 \setstretch{1.15}
5 \renewcommand{\ttdefault}{pcr}
7 \def\Xend{{Xend}\xspace}
8 \def\xend{{xend}\xspace}
10 \latexhtml{\renewcommand{\path}[1]{{\small {\tt #1}}}}{\renewcommand{\path}[1]{{\tt #1}}}
13 \begin{document}
16 \pagestyle{empty}
17 \begin{center}
18 \vspace*{\fill}
19 \includegraphics{figs/xenlogo.eps}
20 \vfill
21 \vfill
22 \vfill
23 \begin{tabular}{l}
24 {\Huge \bf Users' Manual} \\[4mm]
25 {\huge Xen v3.0} \\[80mm]
26 \end{tabular}
27 \end{center}
29 {\bf DISCLAIMER: This documentation is always under active development
30 and as such there may be mistakes and omissions --- watch out for
31 these and please report any you find to the developers' mailing list,
32 xen-devel@lists.xensource.com. The latest version is always available
33 on-line. Contributions of material, suggestions and corrections are
34 welcome.}
36 \vfill
37 \clearpage
41 \pagestyle{empty}
43 \vspace*{\fill}
45 Xen is Copyright \copyright 2002-2005, University of Cambridge, UK, XenSource
46 Inc., IBM Corp., Hewlett-Packard Co., Intel Corp., AMD Inc., and others. All
47 rights reserved.
49 Xen is an open-source project. Most portions of Xen are licensed for copying
50 under the terms of the GNU General Public License, version 2. Other portions
51 are licensed under the terms of the GNU Lesser General Public License, the
52 Zope Public License 2.0, or under ``BSD-style'' licenses. Please refer to the
53 COPYING file for details.
55 Xen includes software by Christopher Clark. This software is covered by the
56 following licence:
58 \begin{quote}
59 Copyright (c) 2002, Christopher Clark. All rights reserved.
61 Redistribution and use in source and binary forms, with or without
62 modification, are permitted provided that the following conditions are met:
64 \begin{itemize}
65 \item Redistributions of source code must retain the above copyright notice,
66 this list of conditions and the following disclaimer.
68 \item Redistributions in binary form must reproduce the above copyright
69 notice, this list of conditions and the following disclaimer in the
70 documentation and/or other materials provided with the distribution.
72 \item Neither the name of the original author; nor the names of any
73 contributors may be used to endorse or promote products derived from this
74 software without specific prior written permission.
75 \end{itemize}
87 \end{quote}
89 \cleardoublepage
93 \pagestyle{plain}
94 \pagenumbering{roman}
95 { \parskip 0pt plus 1pt
96 \tableofcontents }
97 \cleardoublepage
101 \pagenumbering{arabic}
102 \raggedbottom
103 \widowpenalty=10000
104 \clubpenalty=10000
105 \parindent=0pt
106 \parskip=5pt
107 \renewcommand{\topfraction}{.8}
108 \renewcommand{\bottomfraction}{.8}
109 \renewcommand{\textfraction}{.2}
110 \renewcommand{\floatpagefraction}{.8}
111 \setstretch{1.1}
114 %% Chapter Introduction moved to introduction.tex
115 \chapter{Introduction}
118 Xen is an open-source \emph{para-virtualizing} virtual machine monitor
119 (VMM), or ``hypervisor'', for the x86 processor architecture. Xen can
120 securely execute multiple virtual machines on a single physical system
121 with close-to-native performance. Xen facilitates enterprise-grade
122 functionality, including:
124 \begin{itemize}
125 \item Virtual machines with performance close to native hardware.
126 \item Live migration of running virtual machines between physical hosts.
127 \item Up to 32 virtual CPUs per guest virtual machine, with VCPU hotplug.
128 \item x86/32, x86/32 with PAE, and x86/64 platform support.
129 \item Intel Virtualization Technology (VT-x) for unmodified guest operating systems (including Microsoft Windows).
130 \item Excellent hardware support (supports almost all Linux device
131 drivers).
132 \end{itemize}
135 \section{Usage Scenarios}
137 Usage scenarios for Xen include:
139 \begin{description}
140 \item [Server Consolidation.] Move multiple servers onto a single
141 physical host with performance and fault isolation provided at the
142 virtual machine boundaries.
143 \item [Hardware Independence.] Allow legacy applications and operating
144 systems to exploit new hardware.
145 \item [Multiple OS configurations.] Run multiple operating systems
146 simultaneously, for development or testing purposes.
147 \item [Kernel Development.] Test and debug kernel modifications in a
148 sand-boxed virtual machine --- no need for a separate test machine.
149 \item [Cluster Computing.] Management at VM granularity provides more
150 flexibility than separately managing each physical host, but better
151 control and isolation than single-system image solutions,
152 particularly by using live migration for load balancing.
153 \item [Hardware support for custom OSes.] Allow development of new
154 OSes while benefiting from the wide-ranging hardware support of
155 existing OSes such as Linux.
156 \end{description}
159 \section{Operating System Support}
161 Para-virtualization permits very high performance virtualization, even
162 on architectures like x86 that are traditionally very hard to
163 virtualize.
165 This approach requires operating systems to be \emph{ported} to run on
166 Xen. Porting an OS to run on Xen is similar to supporting a new
167 hardware platform, however the process is simplified because the
168 para-virtual machine architecture is very similar to the underlying
169 native hardware. Even though operating system kernels must explicitly
170 support Xen, a key feature is that user space applications and
171 libraries \emph{do not} require modification.
173 With hardware CPU virtualization as provided by Intel VT and AMD
174 SVM technology, the ability to run an unmodified guest OS kernel
175 is available. No porting of the OS is required, although some
176 additional driver support is necessary within Xen itself. Unlike
177 traditional full virtualization hypervisors, which suffer a tremendous
178 performance overhead, the combination of Xen and VT or Xen and
179 Pacifica technology complement one another to offer superb performance
180 for para-virtualized guest operating systems and full support for
181 unmodified guests running natively on the processor. Full support for
182 VT and Pacifica chipsets will appear in early 2006.
184 Paravirtualized Xen support is available for increasingly many
185 operating systems: currently, mature Linux support is available and
186 included in the standard distribution. Other OS ports---including
187 NetBSD, FreeBSD and Solaris x86 v10---are nearing completion.
190 \section{Hardware Support}
192 Xen currently runs on the x86 architecture, requiring a ``P6'' or
193 newer processor (e.g.\ Pentium Pro, Celeron, Pentium~II, Pentium~III,
194 Pentium~IV, Xeon, AMD~Athlon, AMD~Duron). Multiprocessor machines are
195 supported, and there is support for HyperThreading (SMT). In
196 addition, ports to IA64 and Power architectures are in progress.
198 The default 32-bit Xen supports up to 4GB of memory. However Xen 3.0
199 adds support for Intel's Physical Addressing Extensions (PAE), which
200 enable x86/32 machines to address up to 64 GB of physical memory. Xen
201 3.0 also supports x86/64 platforms such as Intel EM64T and AMD Opteron
202 which can currently address up to 1TB of physical memory.
204 Xen offloads most of the hardware support issues to the guest OS
205 running in the \emph{Domain~0} management virtual machine. Xen itself
206 contains only the code required to detect and start secondary
207 processors, set up interrupt routing, and perform PCI bus
208 enumeration. Device drivers run within a privileged guest OS rather
209 than within Xen itself. This approach provides compatibility with the
210 majority of device hardware supported by Linux. The default XenLinux
211 build contains support for most server-class network and disk
212 hardware, but you can add support for other hardware by configuring
213 your XenLinux kernel in the normal way.
216 \section{Structure of a Xen-Based System}
218 A Xen system has multiple layers, the lowest and most privileged of
219 which is Xen itself.
221 Xen may host multiple \emph{guest} operating systems, each of which is
222 executed within a secure virtual machine. In Xen terminology, a
223 \emph{domain}. Domains are scheduled by Xen to make effective use of the
224 available physical CPUs. Each guest OS manages its own applications.
225 This management includes the responsibility of scheduling each
226 application within the time allotted to the VM by Xen.
228 The first domain, \emph{domain~0}, is created automatically when the
229 system boots and has special management privileges. Domain~0 builds
230 other domains and manages their virtual devices. It also performs
231 administrative tasks such as suspending, resuming and migrating other
232 virtual machines.
234 Within domain~0, a process called \emph{xend} runs to manage the system.
235 \Xend\ is responsible for managing virtual machines and providing access
236 to their consoles. Commands are issued to \xend\ over an HTTP interface,
237 via a command-line tool.
240 \section{History}
242 Xen was originally developed by the Systems Research Group at the
243 University of Cambridge Computer Laboratory as part of the XenoServers
244 project, funded by the UK-EPSRC\@.
246 XenoServers aim to provide a ``public infrastructure for global
247 distributed computing''. Xen plays a key part in that, allowing one to
248 efficiently partition a single machine to enable multiple independent
249 clients to run their operating systems and applications in an
250 environment. This environment provides protection, resource isolation
251 and accounting. The project web page contains further information along
252 with pointers to papers and technical reports:
253 \path{http://www.cl.cam.ac.uk/xeno}
255 Xen has grown into a fully-fledged project in its own right, enabling us
256 to investigate interesting research issues regarding the best techniques
257 for virtualizing resources such as the CPU, memory, disk and network.
258 Project contributors now include XenSource, Intel, IBM, HP, AMD, Novell,
259 RedHat.
261 Xen was first described in a paper presented at SOSP in
262 2003\footnote{\tt
263 http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the first
264 public release (1.0) was made that October. Since then, Xen has
265 significantly matured and is now used in production scenarios on many
266 sites.
268 \section{What's New}
270 Xen 3.0.0 offers:
272 \begin{itemize}
273 \item Support for up to 32-way SMP guest operating systems
274 \item Intel (Physical Addressing Extensions) PAE to support 32-bit
275 servers with more than 4GB physical memory
276 \item x86/64 support (Intel EM64T, AMD Opteron)
277 \item Intel VT-x support to enable the running of unmodified guest
278 operating systems (Windows XP/2003, Legacy Linux)
279 \item Enhanced control tools
280 \item Improved ACPI support
281 \item AGP/DRM graphics
282 \end{itemize}
285 Xen 3.0 features greatly enhanced hardware support, configuration
286 flexibility, usability and a larger complement of supported operating
287 systems. This latest release takes Xen a step closer to being the
288 definitive open source solution for virtualization.
292 \part{Installation}
294 %% Chapter Basic Installation
295 \chapter{Basic Installation}
297 The Xen distribution includes three main components: Xen itself, ports
298 of Linux and NetBSD to run on Xen, and the userspace tools required to
299 manage a Xen-based system. This chapter describes how to install the
300 Xen~3.0 distribution from source. Alternatively, there may be pre-built
301 packages available as part of your operating system distribution.
304 \section{Prerequisites}
305 \label{sec:prerequisites}
307 The following is a full list of prerequisites. Items marked `$\dag$' are
308 required by the \xend\ control tools, and hence required if you want to
309 run more than one virtual machine; items marked `$*$' are only required
310 if you wish to build from source.
311 \begin{itemize}
312 \item A working Linux distribution using the GRUB bootloader and running
313 on a P6-class or newer CPU\@.
314 \item [$\dag$] The \path{iproute2} package.
315 \item [$\dag$] The Linux bridge-utils\footnote{Available from {\tt
316 http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl})
317 \item [$\dag$] The Linux hotplug system\footnote{Available from {\tt
318 http://linux-hotplug.sourceforge.net/}} (e.g.,
319 \path{/sbin/hotplug} and related scripts). On newer distributions,
320 this is included alongside the Linux udev system\footnote{See {\tt
321 http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html/}}.
322 \item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make).
323 \item [$*$] Development installation of zlib (e.g.,\ zlib-dev).
324 \item [$*$] Development installation of Python v2.2 or later (e.g.,\
325 python-dev).
326 \item [$*$] \LaTeX\ and transfig are required to build the
327 documentation.
328 \end{itemize}
330 Once you have satisfied these prerequisites, you can now install either
331 a binary or source distribution of Xen.
333 \section{Installing from Binary Tarball}
335 Pre-built tarballs are available for download from the XenSource downloads
336 page:
337 \begin{quote} {\tt http://www.xensource.com/downloads/}
338 \end{quote}
340 Once you've downloaded the tarball, simply unpack and install:
341 \begin{verbatim}
342 # tar zxvf xen-3.0-install.tgz
343 # cd xen-3.0-install
344 # sh ./install.sh
345 \end{verbatim}
347 Once you've installed the binaries you need to configure your system as
348 described in Section~\ref{s:configure}.
350 \section{Installing from RPMs}
351 Pre-built RPMs are available for download from the XenSource downloads
352 page:
353 \begin{quote} {\tt http://www.xensource.com/downloads/}
354 \end{quote}
356 Once you've downloaded the RPMs, you typically install them via the
357 RPM commands:
359 \verb|# rpm -iv rpmname|
361 See the instructions and the Release Notes for each RPM set referenced at:
362 \begin{quote}
363 {\tt http://www.xensource.com/downloads/}.
364 \end{quote}
366 \section{Installing from Source}
368 This section describes how to obtain, build and install Xen from source.
370 \subsection{Obtaining the Source}
372 The Xen source tree is available as either a compressed source tarball
373 or as a clone of our master Mercurial repository.
375 \begin{description}
376 \item[Obtaining the Source Tarball]\mbox{} \\
377 Stable versions and daily snapshots of the Xen source tree are
378 available from the Xen download page:
379 \begin{quote} {\tt \tt http://www.xensource.com/downloads/}
380 \end{quote}
381 \item[Obtaining the source via Mercurial]\mbox{} \\
382 The source tree may also be obtained via the public Mercurial
383 repository at:
384 \begin{quote}{\tt http://xenbits.xensource.com}
385 \end{quote} See the instructions and the Getting Started Guide
386 referenced at:
387 \begin{quote}
388 {\tt http://www.xensource.com/downloads/}
389 \end{quote}
390 \end{description}
392 % \section{The distribution}
393 %
394 % The Xen source code repository is structured as follows:
395 %
396 % \begin{description}
397 % \item[\path{tools/}] Xen node controller daemon (Xend), command line
398 % tools, control libraries
399 % \item[\path{xen/}] The Xen VMM.
400 % \item[\path{buildconfigs/}] Build configuration files
401 % \item[\path{linux-*-xen-sparse/}] Xen support for Linux.
402 % \item[\path{patches/}] Experimental patches for Linux.
403 % \item[\path{docs/}] Various documentation files for users and
404 % developers.
405 % \item[\path{extras/}] Bonus extras.
406 % \end{description}
408 \subsection{Building from Source}
410 The top-level Xen Makefile includes a target ``world'' that will do the
411 following:
413 \begin{itemize}
414 \item Build Xen.
415 \item Build the control tools, including \xend.
416 \item Download (if necessary) and unpack the Linux 2.6 source code, and
417 patch it for use with Xen.
418 \item Build a Linux kernel to use in domain~0 and a smaller unprivileged
419 kernel, which can be used for unprivileged virtual machines.
420 \end{itemize}
422 After the build has completed you should have a top-level directory
423 called \path{dist/} in which all resulting targets will be placed. Of
424 particular interest are the two XenLinux kernel images, one with a
425 ``-xen0'' extension which contains hardware device drivers and drivers
426 for Xen's virtual devices, and one with a ``-xenU'' extension that
427 just contains the virtual ones. These are found in
428 \path{dist/install/boot/} along with the image for Xen itself and the
429 configuration files used during the build.
431 %The NetBSD port can be built using:
432 %\begin{quote}
433 %\begin{verbatim}
434 %# make netbsd20
435 %\end{verbatim}\end{quote}
436 %NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch.
437 %The snapshot is downloaded as part of the build process if it is not
438 %yet present in the \path{NETBSD\_SRC\_PATH} search path. The build
439 %process also downloads a toolchain which includes all of the tools
440 %necessary to build the NetBSD kernel under Linux.
442 To customize the set of kernels built you need to edit the top-level
443 Makefile. Look for the line:
444 \begin{quote}
445 \begin{verbatim}
446 KERNELS ?= linux-2.6-xen0 linux-2.6-xenU
447 \end{verbatim}
448 \end{quote}
450 You can edit this line to include any set of operating system kernels
451 which have configurations in the top-level \path{buildconfigs/}
452 directory.
454 %% Inspect the Makefile if you want to see what goes on during a
455 %% build. Building Xen and the tools is straightforward, but XenLinux
456 %% is more complicated. The makefile needs a `pristine' Linux kernel
457 %% tree to which it will then add the Xen architecture files. You can
458 %% tell the makefile the location of the appropriate Linux compressed
459 %% tar file by
460 %% setting the LINUX\_SRC environment variable, e.g. \\
461 %% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by
462 %% placing the tar file somewhere in the search path of {\tt
463 %% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the
464 %% makefile can't find a suitable kernel tar file it attempts to
465 %% download it from kernel.org (this won't work if you're behind a
466 %% firewall).
468 %% After untaring the pristine kernel tree, the makefile uses the {\tt
469 %% mkbuildtree} script to add the Xen patches to the kernel.
471 %% \framebox{\parbox{5in}{
472 %% {\bf Distro specific:} \\
473 %% {\it Gentoo} --- if not using udev (most installations,
474 %% currently), you'll need to enable devfs and devfs mount at boot
475 %% time in the xen0 config. }}
477 \subsection{Custom Kernels}
479 % If you have an SMP machine you may wish to give the {\tt '-j4'}
480 % argument to make to get a parallel build.
482 If you wish to build a customized XenLinux kernel (e.g.\ to support
483 additional devices or enable distribution-required features), you can
484 use the standard Linux configuration mechanisms, specifying that the
485 architecture being built for is \path{xen}, e.g:
486 \begin{quote}
487 \begin{verbatim}
488 # cd linux-2.6.12-xen0
489 # make ARCH=xen xconfig
490 # cd ..
491 # make
492 \end{verbatim}
493 \end{quote}
495 You can also copy an existing Linux configuration (\path{.config}) into
496 e.g.\ \path{linux-2.6.12-xen0} and execute:
497 \begin{quote}
498 \begin{verbatim}
499 # make ARCH=xen oldconfig
500 \end{verbatim}
501 \end{quote}
503 You may be prompted with some Xen-specific options. We advise accepting
504 the defaults for these options.
506 Note that the only difference between the two types of Linux kernels
507 that are built is the configuration file used for each. The ``U''
508 suffixed (unprivileged) versions don't contain any of the physical
509 hardware device drivers, leading to a 30\% reduction in size; hence you
510 may prefer these for your non-privileged domains. The ``0'' suffixed
511 privileged versions can be used to boot the system, as well as in driver
512 domains and unprivileged domains.
514 \subsection{Installing Generated Binaries}
516 The files produced by the build process are stored under the
517 \path{dist/install/} directory. To install them in their default
518 locations, do:
519 \begin{quote}
520 \begin{verbatim}
521 # make install
522 \end{verbatim}
523 \end{quote}
525 Alternatively, users with special installation requirements may wish to
526 install them manually by copying the files to their appropriate
527 destinations.
529 %% Files in \path{install/boot/} include:
530 %% \begin{itemize}
531 %% \item \path{install/boot/xen-3.0.gz} Link to the Xen 'kernel'
532 %% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0
533 %% XenLinux kernel
534 %% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged
535 %% XenLinux kernel
536 %% \end{itemize}
538 The \path{dist/install/boot} directory will also contain the config
539 files used for building the XenLinux kernels, and also versions of Xen
540 and XenLinux kernels that contain debug symbols such as
541 (\path{xen-syms-3.0.0} and \path{vmlinux-syms-}) which are
542 essential for interpreting crash dumps. Retain these files as the
543 developers may wish to see them if you post on the mailing list.
546 \section{Configuration}
547 \label{s:configure}
549 Once you have built and installed the Xen distribution, it is simple to
550 prepare the machine for booting and running Xen.
552 \subsection{GRUB Configuration}
554 An entry should be added to \path{grub.conf} (often found under
555 \path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot.
556 This file is sometimes called \path{menu.lst}, depending on your
557 distribution. The entry should look something like the following:
559 %% KMSelf Thu Dec 1 19:06:13 PST 2005 262144 is useful for RHEL/RH and
560 %% related Dom0s.
561 {\small
562 \begin{verbatim}
563 title Xen 3.0 / XenLinux 2.6
564 kernel /boot/xen-3.0.gz dom0_mem=262144
565 module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0
566 \end{verbatim}
567 }
569 The kernel line tells GRUB where to find Xen itself and what boot
570 parameters should be passed to it (in this case, setting the domain~0
571 memory allocation in kilobytes and the settings for the serial port).
572 For more details on the various Xen boot parameters see
573 Section~\ref{s:xboot}.
575 The module line of the configuration describes the location of the
576 XenLinux kernel that Xen should start and the parameters that should be
577 passed to it. These are standard Linux parameters, identifying the root
578 device and specifying it be initially mounted read only and instructing
579 that console output be sent to the screen. Some distributions such as
580 SuSE do not require the \path{ro} parameter.
582 %% \framebox{\parbox{5in}{
583 %% {\bf Distro specific:} \\
584 %% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux
585 %% kernel command line, since the partition won't be remounted rw
586 %% during boot. }}
588 To use an initrd, add another \path{module} line to the configuration,
589 like: {\small
590 \begin{verbatim}
591 module /boot/my_initrd.gz
592 \end{verbatim}
593 }
595 %% KMSelf Thu Dec 1 19:05:30 PST 2005 Other configs as an appendix?
597 When installing a new kernel, it is recommended that you do not delete
598 existing menu options from \path{menu.lst}, as you may wish to boot your
599 old Linux kernel in future, particularly if you have problems.
601 \subsection{Serial Console (optional)}
603 Serial console access allows you to manage, monitor, and interact with
604 your system over a serial console. This can allow access from another
605 nearby system via a null-modem (``LapLink'') cable or remotely via a serial
606 concentrator.
608 You system's BIOS, bootloader (GRUB), Xen, Linux, and login access must
609 each be individually configured for serial console access. It is
610 \emph{not} strictly necessary to have each component fully functional,
611 but it can be quite useful.
613 For general information on serial console configuration under Linux,
614 refer to the ``Remote Serial Console HOWTO'' at The Linux Documentation
615 Project: \url{http://www.tldp.org}
617 \subsubsection{Serial Console BIOS configuration}
619 Enabling system serial console output neither enables nor disables
620 serial capabilities in GRUB, Xen, or Linux, but may make remote
621 management of your system more convenient by displaying POST and other
622 boot messages over serial port and allowing remote BIOS configuration.
624 Refer to your hardware vendor's documentation for capabilities and
625 procedures to enable BIOS serial redirection.
628 \subsubsection{Serial Console GRUB configuration}
630 Enabling GRUB serial console output neither enables nor disables Xen or
631 Linux serial capabilities, but may made remote management of your system
632 more convenient by displaying GRUB prompts, menus, and actions over
633 serial port and allowing remote GRUB management.
635 Adding the following two lines to your GRUB configuration file,
636 typically either \path{/boot/grub/menu.lst} or \path{/boot/grub/grub.conf}
637 depending on your distro, will enable GRUB serial output.
639 \begin{quote}
640 {\small \begin{verbatim}
641 serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
642 terminal --timeout=10 serial console
643 \end{verbatim}}
644 \end{quote}
646 Note that when both the serial port and the local monitor and keyboard
647 are enabled, the text ``\emph{Press any key to continue}'' will appear
648 at both. Pressing a key on one device will cause GRUB to display to
649 that device. The other device will see no output. If no key is
650 pressed before the timeout period expires, the system will boot to the
651 default GRUB boot entry.
653 Please refer to the GRUB documentation for further information.
656 \subsubsection{Serial Console Xen configuration}
658 Enabling Xen serial console output neither enables nor disables Linux
659 kernel output or logging in to Linux over serial port. It does however
660 allow you to monitor and log the Xen boot process via serial console and
661 can be very useful in debugging.
663 %% kernel /boot/xen-2.0.gz dom0_mem=131072 console=com1,vga com1=115200,8n1
664 %% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro
666 In order to configure Xen serial console output, it is necessary to
667 add a boot option to your GRUB config; e.g.\ replace the previous
668 example kernel line with:
669 \begin{quote} {\small \begin{verbatim}
670 kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1
671 \end{verbatim}}
672 \end{quote}
674 This configures Xen to output on COM1 at 115,200 baud, 8 data bits, no
675 parity and 1 stop bit. Modify these parameters for your environment.
676 See Section~\ref{s:xboot} for an explanation of all boot parameters.
678 One can also configure XenLinux to share the serial console; to achieve
679 this append ``\path{console=ttyS0}'' to your module line.
682 \subsubsection{Serial Console Linux configuration}
684 Enabling Linux serial console output at boot neither enables nor
685 disables logging in to Linux over serial port. It does however allow
686 you to monitor and log the Linux boot process via serial console and can be
687 very useful in debugging.
689 To enable Linux output at boot time, add the parameter
690 \path{console=ttyS0} (or ttyS1, ttyS2, etc.) to your kernel GRUB line.
691 Under Xen, this might be:
692 \begin{quote}
693 {\footnotesize \begin{verbatim}
694 module /vmlinuz-2.6-xen0 ro root=/dev/VolGroup00/LogVol00 \
695 console=ttyS0, 115200
696 \end{verbatim}}
697 \end{quote}
698 to enable output over ttyS0 at 115200 baud.
702 \subsubsection{Serial Console Login configuration}
704 Logging in to Linux via serial console, under Xen or otherwise, requires
705 specifying a login prompt be started on the serial port. To permit root
706 logins over serial console, the serial port must be added to
707 \path{/etc/securetty}.
709 \newpage
710 To automatically start a login prompt over the serial port,
711 add the line: \begin{quote} {\small {\tt c:2345:respawn:/sbin/mingetty
712 ttyS0}} \end{quote} to \path{/etc/inittab}. Run \path{init q} to force
713 a reload of your inttab and start getty.
715 To enable root logins, add \path{ttyS0} to \path{/etc/securetty} if not
716 already present.
718 Your distribution may use an alternate getty; options include getty,
719 mgetty and agetty. Consult your distribution's documentation
720 for further information.
723 \subsection{TLS Libraries}
725 Users of the XenLinux 2.6 kernel should disable Thread Local Storage
726 (TLS) (e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before
727 attempting to boot a XenLinux kernel\footnote{If you boot without first
728 disabling TLS, you will get a warning message during the boot process.
729 In this case, simply perform the rename after the machine is up and
730 then run \path{/sbin/ldconfig} to make it take effect.}. You can
731 always reenable TLS by restoring the directory to its original location
732 (i.e.\ \path{mv /lib/tls.disabled /lib/tls}).
734 The reason for this is that the current TLS implementation uses
735 segmentation in a way that is not permissible under Xen. If TLS is not
736 disabled, an emulation mode is used within Xen which reduces performance
737 substantially. To ensure full performance you should install a
738 `Xen-friendly' (nosegneg) version of the library.
741 \section{Booting Xen}
743 It should now be possible to restart the system and use Xen. Reboot and
744 choose the new Xen option when the Grub screen appears.
746 What follows should look much like a conventional Linux boot. The first
747 portion of the output comes from Xen itself, supplying low level
748 information about itself and the underlying hardware. The last portion
749 of the output comes from XenLinux.
751 You may see some error messages during the XenLinux boot. These are not
752 necessarily anything to worry about---they may result from kernel
753 configuration differences between your XenLinux kernel and the one you
754 usually use.
756 When the boot completes, you should be able to log into your system as
757 usual. If you are unable to log in, you should still be able to reboot
758 with your normal Linux kernel by selecting it at the GRUB prompt.
761 % Booting Xen
762 \chapter{Booting a Xen System}
764 Booting the system into Xen will bring you up into the privileged
765 management domain, Domain0. At that point you are ready to create
766 guest domains and ``boot'' them using the \texttt{xm create} command.
768 \section{Booting Domain0}
770 After installation and configuration is complete, reboot the system
771 and and choose the new Xen option when the Grub screen appears.
773 What follows should look much like a conventional Linux boot. The
774 first portion of the output comes from Xen itself, supplying low level
775 information about itself and the underlying hardware. The last
776 portion of the output comes from XenLinux.
778 %% KMSelf Wed Nov 30 18:09:37 PST 2005: We should specify what these are.
780 When the boot completes, you should be able to log into your system as
781 usual. If you are unable to log in, you should still be able to
782 reboot with your normal Linux kernel by selecting it at the GRUB prompt.
784 The first step in creating a new domain is to prepare a root
785 filesystem for it to boot. Typically, this might be stored in a normal
786 partition, an LVM or other volume manager partition, a disk file or on
787 an NFS server. A simple way to do this is simply to boot from your
788 standard OS install CD and install the distribution into another
789 partition on your hard drive.
791 To start the \xend\ control daemon, type
792 \begin{quote}
793 \verb!# xend start!
794 \end{quote}
796 If you wish the daemon to start automatically, see the instructions in
797 Section~\ref{s:xend}. Once the daemon is running, you can use the
798 \path{xm} tool to monitor and maintain the domains running on your
799 system. This chapter provides only a brief tutorial. We provide full
800 details of the \path{xm} tool in the next chapter.
802 % \section{From the web interface}
803 %
804 % Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv}
805 % for more details) using the command: \\
806 % \verb_# xensv start_ \\
807 % This will also start Xend (see Chapter~\ref{cha:xend} for more
808 % information).
809 %
810 % The domain management interface will then be available at {\tt
811 % http://your\_machine:8080/}. This provides a user friendly wizard
812 % for starting domains and functions for managing running domains.
813 %
814 % \section{From the command line}
815 \section{Booting Guest Domains}
817 \subsection{Creating a Domain Configuration File}
819 Before you can start an additional domain, you must create a
820 configuration file. We provide two example files which you can use as
821 a starting point:
822 \begin{itemize}
823 \item \path{/etc/xen/xmexample1} is a simple template configuration
824 file for describing a single VM\@.
825 \item \path{/etc/xen/xmexample2} file is a template description that
826 is intended to be reused for multiple virtual machines. Setting the
827 value of the \path{vmid} variable on the \path{xm} command line
828 fills in parts of this template.
829 \end{itemize}
831 There are also a number of other examples which you may find useful.
832 Copy one of these files and edit it as appropriate. Typical values
833 you may wish to edit include:
835 \begin{quote}
836 \begin{description}
837 \item[kernel] Set this to the path of the kernel you compiled for use
838 with Xen (e.g.\ \path{kernel = ``/boot/vmlinuz-2.6-xenU''})
839 \item[memory] Set this to the size of the domain's memory in megabytes
840 (e.g.\ \path{memory = 64})
841 \item[disk] Set the first entry in this list to calculate the offset
842 of the domain's root partition, based on the domain ID\@. Set the
843 second to the location of \path{/usr} if you are sharing it between
844 domains (e.g.\ \path{disk = ['phy:your\_hard\_drive\%d,sda1,w' \%
845 (base\_partition\_number + vmid),
846 'phy:your\_usr\_partition,sda6,r' ]}
847 \item[dhcp] Uncomment the dhcp variable, so that the domain will
848 receive its IP address from a DHCP server (e.g.\ \path{dhcp=``dhcp''})
849 \end{description}
850 \end{quote}
852 You may also want to edit the {\bf vif} variable in order to choose
853 the MAC address of the virtual ethernet interface yourself. For
854 example:
856 \begin{quote}
857 \verb_vif = ['mac=00:16:3E:F6:BB:B3']_
858 \end{quote}
859 If you do not set this variable, \xend\ will automatically generate a
860 random MAC address from the range 00:16:3E:xx:xx:xx, assigned by IEEE to
861 XenSource as an OUI (organizationally unique identifier). XenSource
862 Inc. gives permission for anyone to use addresses randomly allocated
863 from this range for use by their Xen domains.
865 For a list of IEEE OUI assignments, see
866 \url{http://standards.ieee.org/regauth/oui/oui.txt}
869 \subsection{Booting the Guest Domain}
871 The \path{xm} tool provides a variety of commands for managing
872 domains. Use the \path{create} command to start new domains. Assuming
873 you've created a configuration file \path{myvmconf} based around
874 \path{/etc/xen/xmexample2}, to start a domain with virtual machine
875 ID~1 you should type:
877 \begin{quote}
878 \begin{verbatim}
879 # xm create -c myvmconf vmid=1
880 \end{verbatim}
881 \end{quote}
883 The \path{-c} switch causes \path{xm} to turn into the domain's
884 console after creation. The \path{vmid=1} sets the \path{vmid}
885 variable used in the \path{myvmconf} file.
887 You should see the console boot messages from the new domain appearing
888 in the terminal in which you typed the command, culminating in a login
889 prompt.
892 \section{Starting / Stopping Domains Automatically}
894 It is possible to have certain domains start automatically at boot
895 time and to have dom0 wait for all running domains to shutdown before
896 it shuts down the system.
898 To specify a domain is to start at boot-time, place its configuration
899 file (or a link to it) under \path{/etc/xen/auto/}.
901 A Sys-V style init script for Red Hat and LSB-compliant systems is
902 provided and will be automatically copied to \path{/etc/init.d/}
903 during install. You can then enable it in the appropriate way for
904 your distribution.
906 For instance, on Red Hat:
908 \begin{quote}
909 \verb_# chkconfig --add xendomains_
910 \end{quote}
912 By default, this will start the boot-time domains in runlevels 3, 4
913 and 5.
915 You can also use the \path{service} command to run this script
916 manually, e.g:
918 \begin{quote}
919 \verb_# service xendomains start_
921 Starts all the domains with config files under /etc/xen/auto/.
922 \end{quote}
924 \begin{quote}
925 \verb_# service xendomains stop_
927 Shuts down all running Xen domains.
928 \end{quote}
932 \part{Configuration and Management}
934 %% Chapter Domain Management Tools and Daemons
935 \chapter{Domain Management Tools}
937 This chapter summarizes the management software and tools available.
940 \section{\Xend\ }
941 \label{s:xend}
944 The \Xend\ node control daemon performs system management functions
945 related to virtual machines. It forms a central point of control of
946 virtualized resources, and must be running in order to start and manage
947 virtual machines. \Xend\ must be run as root because it needs access to
948 privileged system management functions.
950 An initialization script named \texttt{/etc/init.d/xend} is provided to
951 start \Xend\ at boot time. Use the tool appropriate (i.e. chkconfig) for
952 your Linux distribution to specify the runlevels at which this script
953 should be executed, or manually create symbolic links in the correct
954 runlevel directories.
956 \Xend\ can be started on the command line as well, and supports the
957 following set of parameters:
959 \begin{tabular}{ll}
960 \verb!# xend start! & start \xend, if not already running \\
961 \verb!# xend stop! & stop \xend\ if already running \\
962 \verb!# xend restart! & restart \xend\ if running, otherwise start it \\
963 % \verb!# xend trace_start! & start \xend, with very detailed debug logging \\
964 \verb!# xend status! & indicates \xend\ status by its return code
965 \end{tabular}
967 A SysV init script called {\tt xend} is provided to start \xend\ at
968 boot time. {\tt make install} installs this script in
969 \path{/etc/init.d}. To enable it, you have to make symbolic links in
970 the appropriate runlevel directories or use the {\tt chkconfig} tool,
971 where available. Once \xend\ is running, administration can be done
972 using the \texttt{xm} tool.
974 \subsection{Logging}
976 As \xend\ runs, events will be logged to \path{/var/log/xen/xend.log} and
977 (less frequently) to \path{/var/log/xen/xend-debug.log}. These, along with
978 the standard syslog files, are useful when troubleshooting problems.
980 \subsection{Configuring \Xend\ }
982 \Xend\ is written in Python. At startup, it reads its configuration
983 information from the file \path{/etc/xen/xend-config.sxp}. The Xen
984 installation places an example \texttt{xend-config.sxp} file in the
985 \texttt{/etc/xen} subdirectory which should work for most installations.
987 See the example configuration file \texttt{xend-debug.sxp} and the
988 section 5 man page \texttt{xend-config.sxp} for a full list of
989 parameters and more detailed information. Some of the most important
990 parameters are discussed below.
992 An HTTP interface and a Unix domain socket API are available to
993 communicate with \Xend. This allows remote users to pass commands to the
994 daemon. By default, \Xend does not start an HTTP server. It does start a
995 Unix domain socket management server, as the low level utility
996 \texttt{xm} requires it. For support of cross-machine migration, \Xend\
997 can start a relocation server. This support is not enabled by default
998 for security reasons.
1000 Note: the example \texttt{xend} configuration file modifies the defaults and
1001 starts up \Xend\ as an HTTP server as well as a relocation server.
1003 From the file:
1005 \begin{verbatim}
1006 #(xend-http-server no)
1007 (xend-http-server yes)
1008 #(xend-unix-server yes)
1009 #(xend-relocation-server no)
1010 (xend-relocation-server yes)
1011 \end{verbatim}
1013 Comment or uncomment lines in that file to disable or enable features
1014 that you require.
1016 Connections from remote hosts are disabled by default:
1018 \begin{verbatim}
1019 # Address xend should listen on for HTTP connections, if xend-http-server is
1020 # set.
1021 # Specifying 'localhost' prevents remote connections.
1022 # Specifying the empty string '' (the default) allows all connections.
1023 #(xend-address '')
1024 (xend-address localhost)
1025 \end{verbatim}
1027 It is recommended that if migration support is not needed, the
1028 \texttt{xend-relocation-server} parameter value be changed to
1029 ``\texttt{no}'' or commented out.
1031 \section{Xm}
1032 \label{s:xm}
1034 The xm tool is the primary tool for managing Xen from the console. The
1035 general format of an xm command line is:
1037 \begin{verbatim}
1038 # xm command [switches] [arguments] [variables]
1039 \end{verbatim}
1041 The available \emph{switches} and \emph{arguments} are dependent on the
1042 \emph{command} chosen. The \emph{variables} may be set using
1043 declarations of the form {\tt variable=value} and command line
1044 declarations override any of the values in the configuration file being
1045 used, including the standard variables described above and any custom
1046 variables (for instance, the \path{xmdefconfig} file uses a {\tt vmid}
1047 variable).
1049 For online help for the commands available, type:
1051 \begin{quote}
1052 \begin{verbatim}
1053 # xm help
1054 \end{verbatim}
1055 \end{quote}
1057 This will list the most commonly used commands. The full list can be obtained
1058 using \verb_xm help --long_. You can also type \path{xm help $<$command$>$}
1059 for more information on a given command.
1061 \subsection{Basic Management Commands}
1063 One useful command is \verb_# xm list_ which lists all domains running in rows
1064 of the following format:
1065 \begin{center} {\tt name domid memory vcpus state cputime}
1066 \end{center}
1068 The meaning of each field is as follows:
1069 \begin{quote}
1070 \begin{description}
1071 \item[name] The descriptive name of the virtual machine.
1072 \item[domid] The number of the domain ID this virtual machine is
1073 running in.
1074 \item[memory] Memory size in megabytes.
1075 \item[vcpus] The number of virtual CPUs this domain has.
1076 \item[state] Domain state consists of 5 fields:
1077 \begin{description}
1078 \item[r] running
1079 \item[b] blocked
1080 \item[p] paused
1081 \item[s] shutdown
1082 \item[c] crashed
1083 \end{description}
1084 \item[cputime] How much CPU time (in seconds) the domain has used so
1085 far.
1086 \end{description}
1087 \end{quote}
1089 The \path{xm list} command also supports a long output format when the
1090 \path{-l} switch is used. This outputs the full details of the
1091 running domains in \xend's SXP configuration format.
1093 If you want to know how long your domains have been running for, then
1094 you can use the \verb_# xm uptime_ command.
1097 You can get access to the console of a particular domain using
1098 the \verb_# xm console_ command (e.g.\ \verb_# xm console myVM_).
1100 \subsection{Domain Scheduling Management Commands}
1102 The credit CPU scheduler automatically load balances guest VCPUs
1103 across all available physical CPUs on an SMP host. The user need
1104 not manually pin VCPUs to load balance the system. However, she
1105 can restrict which CPUs a particular VCPU may run on using
1106 the \path{xm vcpu-pin} command.
1108 Each guest domain is assigned a \path{weight} and a \path{cap}.
1110 A domain with a weight of 512 will get twice as much CPU as a
1111 domain with a weight of 256 on a contended host. Legal weights
1112 range from 1 to 65535 and the default is 256.
1114 The cap optionally fixes the maximum amount of CPU a guest will
1115 be able to consume, even if the host system has idle CPU cycles.
1116 The cap is expressed in percentage of one physical CPU: 100 is
1117 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc... The
1118 default, 0, means there is no upper cap.
1120 When you are running with the credit scheduler, you can check and
1121 modify your domains' weights and caps using the \path{xm sched-credit}
1122 command:
1124 \begin{tabular}{ll}
1125 \verb!xm sched-credit -d <domain>! & lists weight and cap \\
1126 \verb!xm sched-credit -d <domain> -w <weight>! & sets the weight \\
1127 \verb!xm sched-credit -d <domain> -c <cap>! & sets the cap
1128 \end{tabular}
1132 %% Chapter Domain Configuration
1133 \chapter{Domain Configuration}
1134 \label{cha:config}
1136 The following contains the syntax of the domain configuration files
1137 and description of how to further specify networking, driver domain
1138 and general scheduling behavior.
1141 \section{Configuration Files}
1142 \label{s:cfiles}
1144 Xen configuration files contain the following standard variables.
1145 Unless otherwise stated, configuration items should be enclosed in
1146 quotes: see the configuration scripts in \path{/etc/xen/}
1147 for concrete examples.
1149 \begin{description}
1150 \item[kernel] Path to the kernel image.
1151 \item[ramdisk] Path to a ramdisk image (optional).
1152 % \item[builder] The name of the domain build function (e.g.
1153 % {\tt'linux'} or {\tt'netbsd'}.
1154 \item[memory] Memory size in megabytes.
1155 \item[vcpus] The number of virtual CPUs.
1156 \item[console] Port to export the domain console on (default 9600 +
1157 domain ID).
1158 \item[vif] Network interface configuration. This may simply contain
1159 an empty string for each desired interface, or may override various
1160 settings, e.g.\
1161 \begin{verbatim}
1162 vif = [ 'mac=00:16:3E:00:00:11, bridge=xen-br0',
1163 'bridge=xen-br1' ]
1164 \end{verbatim}
1165 to assign a MAC address and bridge to the first interface and assign
1166 a different bridge to the second interface, leaving \xend\ to choose
1167 the MAC address. The settings that may be overridden in this way are
1168 type, mac, bridge, ip, script, backend, and vifname.
1169 \item[disk] List of block devices to export to the domain e.g.
1170 \verb_disk = [ 'phy:hda1,sda1,r' ]_
1171 exports physical device \path{/dev/hda1} to the domain as
1172 \path{/dev/sda1} with read-only access. Exporting a disk read-write
1173 which is currently mounted is dangerous -- if you are \emph{certain}
1174 you wish to do this, you can specify \path{w!} as the mode.
1175 \item[dhcp] Set to {\tt `dhcp'} if you want to use DHCP to configure
1176 networking.
1177 \item[netmask] Manually configured IP netmask.
1178 \item[gateway] Manually configured IP gateway.
1179 \item[hostname] Set the hostname for the virtual machine.
1180 \item[root] Specify the root device parameter on the kernel command
1181 line.
1182 \item[nfs\_server] IP address for the NFS server (if any).
1183 \item[nfs\_root] Path of the root filesystem on the NFS server (if
1184 any).
1185 \item[extra] Extra string to append to the kernel command line (if
1186 any)
1187 \end{description}
1189 Additional fields are documented in the example configuration files
1190 (e.g. to configure virtual TPM functionality).
1192 For additional flexibility, it is also possible to include Python
1193 scripting commands in configuration files. An example of this is the
1194 \path{xmexample2} file, which uses Python code to handle the
1195 \path{vmid} variable.
1198 %\part{Advanced Topics}
1201 \section{Network Configuration}
1203 For many users, the default installation should work ``out of the
1204 box''. More complicated network setups, for instance with multiple
1205 Ethernet interfaces and/or existing bridging setups will require some
1206 special configuration.
1208 The purpose of this section is to describe the mechanisms provided by
1209 \xend\ to allow a flexible configuration for Xen's virtual networking.
1211 \subsection{Xen virtual network topology}
1213 Each domain network interface is connected to a virtual network
1214 interface in dom0 by a point to point link (effectively a ``virtual
1215 crossover cable''). These devices are named {\tt
1216 vif$<$domid$>$.$<$vifid$>$} (e.g.\ {\tt vif1.0} for the first
1217 interface in domain~1, {\tt vif3.1} for the second interface in
1218 domain~3).
1220 Traffic on these virtual interfaces is handled in domain~0 using
1221 standard Linux mechanisms for bridging, routing, rate limiting, etc.
1222 Xend calls on two shell scripts to perform initial configuration of
1223 the network and configuration of new virtual interfaces. By default,
1224 these scripts configure a single bridge for all the virtual
1225 interfaces. Arbitrary routing / bridging configurations can be
1226 configured by customizing the scripts, as described in the following
1227 section.
1229 \subsection{Xen networking scripts}
1231 Xen's virtual networking is configured by two shell scripts (by
1232 default \path{network-bridge} and \path{vif-bridge}). These are called
1233 automatically by \xend\ when certain events occur, with arguments to
1234 the scripts providing further contextual information. These scripts
1235 are found by default in \path{/etc/xen/scripts}. The names and
1236 locations of the scripts can be configured in
1237 \path{/etc/xen/xend-config.sxp}.
1239 \begin{description}
1240 \item[network-bridge:] This script is called whenever \xend\ is started or
1241 stopped to respectively initialize or tear down the Xen virtual
1242 network. In the default configuration initialization creates the
1243 bridge `xen-br0' and moves eth0 onto that bridge, modifying the
1244 routing accordingly. When \xend\ exits, it deletes the Xen bridge
1245 and removes eth0, restoring the normal IP and routing configuration.
1247 %% In configurations where the bridge already exists, this script
1248 %% could be replaced with a link to \path{/bin/true} (for instance).
1250 \item[vif-bridge:] This script is called for every domain virtual
1251 interface and can configure firewalling rules and add the vif to the
1252 appropriate bridge. By default, this adds and removes VIFs on the
1253 default Xen bridge.
1254 \end{description}
1256 Other example scripts are available (\path{network-route} and
1257 \path{vif-route}, \path{network-nat} and \path{vif-nat}).
1258 For more complex network setups (e.g.\ where routing is required or
1259 integrate with existing bridges) these scripts may be replaced with
1260 customized variants for your site's preferred configuration.
1262 \section{Driver Domain Configuration}
1263 \label{s:ddconf}
1265 \subsection{PCI}
1266 \label{ss:pcidd}
1268 Individual PCI devices can be assigned to a given domain (a PCI driver domain)
1269 to allow that domain direct access to the PCI hardware.
1271 While PCI Driver Domains can increase the stability and security of a system
1272 by addressing a number of security concerns, there are some security issues
1273 that remain that you can read about in Section~\ref{s:ddsecurity}.
1275 \subsubsection{Compile-Time Setup}
1276 To use this functionality, ensure
1277 that the PCI Backend is compiled in to a privileged domain (e.g. domain 0)
1278 and that the domains which will be assigned PCI devices have the PCI Frontend
1279 compiled in. In XenLinux, the PCI Backend is available under the Xen
1280 configuration section while the PCI Frontend is under the
1281 architecture-specific "Bus Options" section. You may compile both the backend
1282 and the frontend into the same kernel; they will not affect each other.
1284 \subsubsection{PCI Backend Configuration - Binding at Boot}
1285 The PCI devices you wish to assign to unprivileged domains must be "hidden"
1286 from your backend domain (usually domain 0) so that it does not load a driver
1287 for them. Use the \path{pciback.hide} kernel parameter which is specified on
1288 the kernel command-line and is configurable through GRUB (see
1289 Section~\ref{s:configure}). Note that devices are not really hidden from the
1290 backend domain. The PCI Backend appears to the Linux kernel as a regular PCI
1291 device driver. The PCI Backend ensures that no other device driver loads
1292 for the devices by binding itself as the device driver for those devices.
1293 PCI devices are identified by hexadecimal slot/function numbers (on Linux,
1294 use \path{lspci} to determine slot/function numbers of your devices) and
1295 can be specified with or without the PCI domain: \\
1296 \centerline{ {\tt ({\em bus}:{\em slot}.{\em func})} example {\tt (02:1d.3)}} \\
1297 \centerline{ {\tt ({\em domain}:{\em bus}:{\em slot}.{\em func})} example {\tt (0000:02:1d.3)}} \\
1299 An example kernel command-line which hides two PCI devices might be: \\
1300 \centerline{ {\tt root=/dev/sda4 ro console=tty0 pciback.hide=(02:01.f)(0000:04:1d.0) } } \\
1302 \subsubsection{PCI Backend Configuration - Late Binding}
1303 PCI devices can also be bound to the PCI Backend after boot through the manual
1304 binding/unbinding facilities provided by the Linux kernel in sysfs (allowing
1305 for a Xen user to give PCI devices to driver domains that were not specified
1306 on the kernel command-line). There are several attributes with the PCI
1307 Backend's sysfs directory (\path{/sys/bus/pci/drivers/pciback}) that can be
1308 used to bind/unbind devices:
1310 \begin{description}
1311 \item[slots] lists all of the PCI slots that the PCI Backend will try to seize
1312 (or "hide" from Domain 0). A PCI slot must appear in this list before it can
1313 be bound to the PCI Backend through the \path{bind} attribute.
1314 \item[new\_slot] write the name of a slot here (in 0000:00:00.0 format) to
1315 have the PCI Backend seize the device in this slot.
1316 \item[remove\_slot] write the name of a slot here (same format as
1317 \path{new\_slot}) to have the PCI Backend no longer try to seize devices in
1318 this slot. Note that this does not unbind the driver from a device it has
1319 already seized.
1320 \item[bind] write the name of a slot here (in 0000:00:00.0 format) to have
1321 the Linux kernel attempt to bind the device in that slot to the PCI Backend
1322 driver.
1323 \item[unbind] write the name of a skit here (same format as \path{bind}) to have
1324 the Linux kernel unbind the device from the PCI Backend. DO NOT unbind a
1325 device while it is currently given to a PCI driver domain!
1326 \end{description}
1328 Some examples:
1330 Bind a device to the PCI Backend which is not bound to any other driver.
1331 \begin{verbatim}
1332 # # Add a new slot to the PCI Backend's list
1333 # echo -n 0000:01:04.d > /sys/bus/pci/drivers/pciback/new_slot
1334 # # Now that the backend is watching for the slot, bind to it
1335 # echo -n 0000:01:04.d > /sys/bus/pci/drivers/pciback/bind
1336 \end{verbatim}
1338 Unbind a device from its driver and bind to the PCI Backend.
1339 \begin{verbatim}
1340 # # Unbind a PCI network card from its network driver
1341 # echo -n 0000:05:02.0 > /sys/bus/pci/drivers/3c905/unbind
1342 # # And now bind it to the PCI Backend
1343 # echo -n 0000:05:02.0 > /sys/bus/pci/drivers/pciback/new_slot
1344 # echo -n 0000:05:02.0 > /sys/bus/pci/drivers/pciback/bind
1345 \end{verbatim}
1347 Note that the "-n" option in the example is important as it causes echo to not
1348 output a new-line.
1350 \subsubsection{PCI Backend Configuration - User-space Quirks}
1351 Quirky devices (such as the Broadcom Tigon 3) may need write access to their
1352 configuration space registers. Xen can be instructed to allow specified PCI
1353 devices write access to specific configuration space registers. The policy may
1354 be found in:
1356 \centerline{ \path{/etc/xen/xend-pci-quirks.sxp} }
1358 The policy file is heavily commented and is intended to provide enough
1359 documentation for developers to extend it.
1361 \subsubsection{PCI Backend Configuration - Permissive Flag}
1362 If the user-space quirks approach doesn't meet your needs you may want to enable
1363 the permissive flag for that device. To do so, first get the PCI domain, bus,
1364 slot, and function information from dom0 via \path{lspci}. Then augment the
1365 user-space policy for permissive devices. The permissive policy can be found
1366 in:
1368 \centerline{ \path{/etc/xen/xend-pci-permissive.sxp} }
1370 Currently, the only way to reset the permissive flag is to unbind the device
1371 from the PCI Backend driver.
1373 \subsubsection{PCI Backend - Checking Status}
1374 There two important sysfs nodes that provide a mechanism to view specifics on
1375 quirks and permissive devices:
1376 \begin{description}
1377 \item \path{/sys/bus/drivers/pciback/permissive} \\
1378 Use \path{cat} on this file to view a list of permissive slots.
1379 \item \path{/sys/bus/drivers/pciback/quirks} \\
1380 Use \path{cat} on this file view a hierarchical view of devices bound to the
1381 PCI backend, their PCI vendor/device ID, and any quirks that are associated with
1382 that particular slot.
1383 \end{description}
1385 You may notice that every device bound to the PCI backend has 17 quirks standard
1386 "quirks" regardless of \path{xend-pci-quirks.sxp}. These default entries are
1387 necessary to support interactions between the PCI bus manager and the device bound
1388 to it. Even non-quirky devices should have these standard entries.
1390 In this case, preference was given to accuracy over aesthetics by choosing to
1391 show the standard quirks in the quirks list rather than hide them from the
1392 inquiring user
1394 \subsubsection{PCI Frontend Configuration}
1395 To configure a domU to receive a PCI device:
1397 \begin{description}
1398 \item[Command-line:]
1399 Use the {\em pci} command-line flag. For multiple devices, use the option
1400 multiple times. \\
1401 \centerline{ {\tt xm create netcard-dd pci=01:00.0 pci=02:03.0 }} \\
1403 \item[Flat Format configuration file:]
1404 Specify all of your PCI devices in a python list named {\em pci}. \\
1405 \centerline{ {\tt pci=['01:00.0','02:03.0'] }} \\
1407 \item[SXP Format configuration file:]
1408 Use a single PCI device section for all of your devices (specify the numbers
1409 in hexadecimal with the preceding '0x'). Note that {\em domain} here refers
1410 to the PCI domain, not a virtual machine within Xen.
1411 {\small
1412 \begin{verbatim}
1413 (device (pci
1414 (dev (domain 0x0)(bus 0x3)(slot 0x1a)(func 0x1)
1415 (dev (domain 0x0)(bus 0x1)(slot 0x5)(func 0x0)
1417 \end{verbatim}
1419 \end{description}
1421 %% There are two possible types of privileges: IO privileges and
1422 %% administration privileges.
1424 \section{Support for virtual Trusted Platform Module (vTPM)}
1425 \label{ss:vtpm}
1427 Paravirtualized domains can be given access to a virtualized version
1428 of a TPM. This enables applications in these domains to use the services
1429 of the TPM device for example through a TSS stack
1430 \footnote{Trousers TSS stack: http://sourceforge.net/projects/trousers}.
1431 The Xen source repository provides the necessary software components to
1432 enable virtual TPM access. Support is provided through several
1433 different pieces. First, a TPM emulator has been modified to provide TPM's
1434 functionality for the virtual TPM subsystem. Second, a virtual TPM Manager
1435 coordinates the virtual TPMs efforts, manages their creation, and provides
1436 protected key storage using the TPM. Third, a device driver pair providing
1437 a TPM front- and backend is available for XenLinux to deliver TPM commands
1438 from the domain to the virtual TPM manager, which dispatches it to a
1439 software TPM. Since the TPM Manager relies on a HW TPM for protected key
1440 storage, therefore this subsystem requires a Linux-supported hardware TPM.
1441 For development purposes, a TPM emulator is available for use on non-TPM
1442 enabled platforms.
1444 \subsubsection{Compile-Time Setup}
1445 To enable access to the virtual TPM, the virtual TPM backend driver must
1446 be compiled for a privileged domain (e.g. domain 0). Using the XenLinux
1447 configuration, the necessary driver can be selected in the Xen configuration
1448 section. Unless the driver has been compiled into the kernel, its module
1449 must be activated using the following command:
1451 \begin{verbatim}
1452 modprobe tpmbk
1453 \end{verbatim}
1455 Similarly, the TPM frontend driver must be compiled for the kernel trying
1456 to use TPM functionality. Its driver can be selected in the kernel
1457 configuration section Device Driver / Character Devices / TPM Devices.
1458 Along with that the TPM driver for the built-in TPM must be selected.
1459 If the virtual TPM driver has been compiled as module, it
1460 must be activated using the following command:
1462 \begin{verbatim}
1463 modprobe tpm_xenu
1464 \end{verbatim}
1466 Furthermore, it is necessary to build the virtual TPM manager and software
1467 TPM by making changes to entries in Xen build configuration files.
1468 The following entry in the file Config.mk in the Xen root source
1469 directory must be made:
1471 \begin{verbatim}
1472 VTPM_TOOLS ?= y
1473 \end{verbatim}
1475 After a build of the Xen tree and a reboot of the machine, the TPM backend
1476 drive must be loaded. Once loaded, the virtual TPM manager daemon
1477 must be started before TPM-enabled guest domains may be launched.
1478 To enable being the destination of a virtual TPM Migration, the virtual TPM
1479 migration daemon must also be loaded.
1481 \begin{verbatim}
1482 vtpm_managerd
1483 \end{verbatim}
1484 \begin{verbatim}
1485 vtpm_migratord
1486 \end{verbatim}
1488 Once the VTPM manager is running, the VTPM can be accessed by loading the
1489 front end driver in a guest domain.
1491 \subsubsection{Development and Testing TPM Emulator}
1492 For development and testing on non-TPM enabled platforms, a TPM emulator
1493 can be used in replacement of a platform TPM. First, the entry in the file
1494 tools/vtpm/Rules.mk must look as follows:
1496 \begin{verbatim}
1498 \end{verbatim}
1500 Second, the entry in the file tool/vtpm\_manager/Rules.mk must be uncommented
1501 as follows:
1503 \begin{verbatim}
1504 # TCS talks to fifo's rather than /dev/tpm. TPM Emulator assumed on fifos
1506 \end{verbatim}
1508 Before starting the virtual TPM Manager, start the emulator by executing
1509 the following in dom0:
1511 \begin{verbatim}
1512 tpm_emulator clear
1513 \end{verbatim}
1515 \subsubsection{vTPM Frontend Configuration}
1516 To provide TPM functionality to a user domain, a line must be added to
1517 the virtual TPM configuration file using the following format:
1519 \begin{verbatim}
1520 vtpm = ['instance=<instance number>, backend=<domain id>']
1521 \end{verbatim}
1523 The { \it instance number} reflects the preferred virtual TPM instance
1524 to associate with the domain. If the selected instance is
1525 already associated with another domain, the system will automatically
1526 select the next available instance. An instance number greater than
1527 zero must be provided. It is possible to omit the instance
1528 parameter from the configuration file.
1530 The {\it domain id} provides the ID of the domain where the
1531 virtual TPM backend driver and virtual TPM are running in. It should
1532 currently always be set to '0'.
1535 Examples for valid vtpm entries in the configuration file are
1537 \begin{verbatim}
1538 vtpm = ['instance=1, backend=0']
1539 \end{verbatim}
1540 and
1541 \begin{verbatim}
1542 vtpm = ['backend=0'].
1543 \end{verbatim}
1545 \subsubsection{Using the virtual TPM}
1547 Access to TPM functionality is provided by the virtual TPM frontend driver.
1548 Similar to existing hardware TPM drivers, this driver provides basic TPM
1549 status information through the {\it sysfs} filesystem. In a Xen user domain
1550 the sysfs entries can be found in /sys/devices/xen/vtpm-0.
1552 Commands can be sent to the virtual TPM instance using the character
1553 device /dev/tpm0 (major 10, minor 224).
1555 % Chapter Storage and FileSytem Management
1556 \chapter{Storage and File System Management}
1558 Storage can be made available to virtual machines in a number of
1559 different ways. This chapter covers some possible configurations.
1561 The most straightforward method is to export a physical block device (a
1562 hard drive or partition) from dom0 directly to the guest domain as a
1563 virtual block device (VBD).
1565 Storage may also be exported from a filesystem image or a partitioned
1566 filesystem image as a \emph{file-backed VBD}.
1568 Finally, standard network storage protocols such as NBD, iSCSI, NFS,
1569 etc., can be used to provide storage to virtual machines.
1572 \section{Exporting Physical Devices as VBDs}
1573 \label{s:exporting-physical-devices-as-vbds}
1575 One of the simplest configurations is to directly export individual
1576 partitions from domain~0 to other domains. To achieve this use the
1577 \path{phy:} specifier in your domain configuration file. For example a
1578 line like
1579 \begin{quote}
1580 \verb_disk = ['phy:hda3,sda1,w']_
1581 \end{quote}
1582 specifies that the partition \path{/dev/hda3} in domain~0 should be
1583 exported read-write to the new domain as \path{/dev/sda1}; one could
1584 equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should
1585 one wish.
1587 In addition to local disks and partitions, it is possible to export
1588 any device that Linux considers to be ``a disk'' in the same manner.
1589 For example, if you have iSCSI disks or GNBD volumes imported into
1590 domain~0 you can export these to other domains using the \path{phy:}
1591 disk syntax. E.g.:
1592 \begin{quote}
1593 \verb_disk = ['phy:vg/lvm1,sda2,w']_
1594 \end{quote}
1596 \begin{center}
1597 \framebox{\bf Warning: Block device sharing}
1598 \end{center}
1599 \begin{quote}
1600 Block devices should typically only be shared between domains in a
1601 read-only fashion otherwise the Linux kernel's file systems will get
1602 very confused as the file system structure may change underneath
1603 them (having the same ext3 partition mounted \path{rw} twice is a
1604 sure fire way to cause irreparable damage)! \Xend\ will attempt to
1605 prevent you from doing this by checking that the device is not
1606 mounted read-write in domain~0, and hasn't already been exported
1607 read-write to another domain. If you want read-write sharing,
1608 export the directory to other domains via NFS from domain~0 (or use
1609 a cluster file system such as GFS or ocfs2).
1610 \end{quote}
1613 \section{Using File-backed VBDs}
1615 It is also possible to use a file in Domain~0 as the primary storage
1616 for a virtual machine. As well as being convenient, this also has the
1617 advantage that the virtual block device will be \emph{sparse} ---
1618 space will only really be allocated as parts of the file are used. So
1619 if a virtual machine uses only half of its disk space then the file
1620 really takes up half of the size allocated.
1622 For example, to create a 2GB sparse file-backed virtual block device
1623 (actually only consumes 1KB of disk):
1624 \begin{quote}
1625 \verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_
1626 \end{quote}
1628 Make a file system in the disk file:
1629 \begin{quote}
1630 \verb_# mkfs -t ext3 vm1disk_
1631 \end{quote}
1633 (when the tool asks for confirmation, answer `y')
1635 Populate the file system e.g.\ by copying from the current root:
1636 \begin{quote}
1637 \begin{verbatim}
1638 # mount -o loop vm1disk /mnt
1639 # cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt
1640 # mkdir /mnt/{proc,sys,home,tmp}
1641 \end{verbatim}
1642 \end{quote}
1644 Tailor the file system by editing \path{/etc/fstab},
1645 \path{/etc/hostname}, etc.\ Don't forget to edit the files in the
1646 mounted file system, instead of your domain~0 filesystem, e.g.\ you
1647 would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}. For
1648 this example put \path{/dev/sda1} to root in fstab.
1650 Now unmount (this is important!):
1651 \begin{quote}
1652 \verb_# umount /mnt_
1653 \end{quote}
1655 In the configuration file set:
1656 \begin{quote}
1657 \verb_disk = ['tap:aio:/full/path/to/vm1disk,sda1,w']_
1658 \end{quote}
1660 As the virtual machine writes to its `disk', the sparse file will be
1661 filled in and consume more space up to the original 2GB.
1663 {\em{Note:}} Users that have worked with file-backed VBDs on Xen in previous
1664 versions will be interested to know that this support is now provided through
1665 the blktap driver instead of the loopback driver. This change results in
1666 file-based block devices that are higher-performance, more scalable, and which
1667 provide better safety properties for VBD data. All that is required to update
1668 your existing file-backed VM configurations is to change VBD configuration
1669 lines from:
1670 \begin{quote}
1671 \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_
1672 \end{quote}
1673 to:
1674 \begin{quote}
1675 \verb_disk = ['tap:aio:/full/path/to/vm1disk,sda1,w']_
1676 \end{quote}
1679 \subsection{Loopback-mounted file-backed VBDs (deprecated)}
1681 {\em{{\bf{Note:}} Loopback mounted VBDs have now been replaced with
1682 blktap-based support for raw image files, as described above. This
1683 section remains to detail a configuration that was used by older Xen
1684 versions.}}
1686 Raw image file-backed VBDs amy also be attached to VMs using the
1687 Linux loopback driver. The only required change to the raw file
1688 instructions above are to specify the configuration entry as:
1689 \begin{quote}
1690 \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_
1691 \end{quote}
1693 {\bf Note that loopback file-backed VBDs may not be appropriate for backing
1694 I/O-intensive domains.} This approach is known to experience
1695 substantial slowdowns under heavy I/O workloads, due to the I/O
1696 handling by the loopback block device used to support file-backed VBDs
1697 in dom0. Loopbach support remains for old Xen installations, and users
1698 are strongly encouraged to use the blktap-based file support (using
1699 ``{\tt{tap:aio}}'' as described above).
1701 Additionally, Linux supports a maximum of eight loopback file-backed
1702 VBDs across all domains by default. This limit can be statically
1703 increased by using the \emph{max\_loop} module parameter if
1704 CONFIG\_BLK\_DEV\_LOOP is compiled as a module in the dom0 kernel, or
1705 by using the \emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP
1706 is compiled directly into the dom0 kernel. Again, users are encouraged
1707 to use the blktap-based file support described above which scales to much
1708 larger number of active VBDs.
1711 \section{Using LVM-backed VBDs}
1712 \label{s:using-lvm-backed-vbds}
1714 A particularly appealing solution is to use LVM volumes as backing for
1715 domain file-systems since this allows dynamic growing/shrinking of
1716 volumes as well as snapshot and other features.
1718 To initialize a partition to support LVM volumes:
1719 \begin{quote}
1720 \begin{verbatim}
1721 # pvcreate /dev/sda10
1722 \end{verbatim}
1723 \end{quote}
1725 Create a volume group named `vg' on the physical partition:
1726 \begin{quote}
1727 \begin{verbatim}
1728 # vgcreate vg /dev/sda10
1729 \end{verbatim}
1730 \end{quote}
1732 Create a logical volume of size 4GB named `myvmdisk1':
1733 \begin{quote}
1734 \begin{verbatim}
1735 # lvcreate -L4096M -n myvmdisk1 vg
1736 \end{verbatim}
1737 \end{quote}
1739 You should now see that you have a \path{/dev/vg/myvmdisk1} Make a
1740 filesystem, mount it and populate it, e.g.:
1741 \begin{quote}
1742 \begin{verbatim}
1743 # mkfs -t ext3 /dev/vg/myvmdisk1
1744 # mount /dev/vg/myvmdisk1 /mnt
1745 # cp -ax / /mnt
1746 # umount /mnt
1747 \end{verbatim}
1748 \end{quote}
1750 Now configure your VM with the following disk configuration:
1751 \begin{quote}
1752 \begin{verbatim}
1753 disk = [ 'phy:vg/myvmdisk1,sda1,w' ]
1754 \end{verbatim}
1755 \end{quote}
1757 LVM enables you to grow the size of logical volumes, but you'll need
1758 to resize the corresponding file system to make use of the new space.
1759 Some file systems (e.g.\ ext3) now support online resize. See the LVM
1760 manuals for more details.
1762 You can also use LVM for creating copy-on-write (CoW) clones of LVM
1763 volumes (known as writable persistent snapshots in LVM terminology).
1764 This facility is new in Linux 2.6.8, so isn't as stable as one might
1765 hope. In particular, using lots of CoW LVM disks consumes a lot of
1766 dom0 memory, and error conditions such as running out of disk space
1767 are not handled well. Hopefully this will improve in future.
1769 To create two copy-on-write clones of the above file system you would
1770 use the following commands:
1772 \begin{quote}
1773 \begin{verbatim}
1774 # lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1
1775 # lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1
1776 \end{verbatim}
1777 \end{quote}
1779 Each of these can grow to have 1GB of differences from the master
1780 volume. You can grow the amount of space for storing the differences
1781 using the lvextend command, e.g.:
1782 \begin{quote}
1783 \begin{verbatim}
1784 # lvextend +100M /dev/vg/myclonedisk1
1785 \end{verbatim}
1786 \end{quote}
1788 Don't let the `differences volume' ever fill up otherwise LVM gets
1789 rather confused. It may be possible to automate the growing process by
1790 using \path{dmsetup wait} to spot the volume getting full and then
1791 issue an \path{lvextend}.
1793 In principle, it is possible to continue writing to the volume that
1794 has been cloned (the changes will not be visible to the clones), but
1795 we wouldn't recommend this: have the cloned volume as a `pristine'
1796 file system install that isn't mounted directly by any of the virtual
1797 machines.
1800 \section{Using NFS Root}
1802 First, populate a root filesystem in a directory on the server
1803 machine. This can be on a distinct physical machine, or simply run
1804 within a virtual machine on the same node.
1806 Now configure the NFS server to export this filesystem over the
1807 network by adding a line to \path{/etc/exports}, for instance:
1809 \begin{quote}
1810 \begin{small}
1811 \begin{verbatim}
1812 /export/vm1root (rw,sync,no_root_squash)
1813 \end{verbatim}
1814 \end{small}
1815 \end{quote}
1817 Finally, configure the domain to use NFS root. In addition to the
1818 normal variables, you should make sure to set the following values in
1819 the domain's configuration file:
1821 \begin{quote}
1822 \begin{small}
1823 \begin{verbatim}
1824 root = '/dev/nfs'
1825 nfs_server = '' # substitute IP address of server
1826 nfs_root = '/path/to/root' # path to root FS on the server
1827 \end{verbatim}
1828 \end{small}
1829 \end{quote}
1831 The domain will need network access at boot time, so either statically
1832 configure an IP address using the config variables \path{ip},
1833 \path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP
1834 (\path{dhcp='dhcp'}).
1836 Note that the Linux NFS root implementation is known to have stability
1837 problems under high load (this is not a Xen-specific problem), so this
1838 configuration may not be appropriate for critical servers.
1841 \chapter{CPU Management}
1843 %% KMS Something sage about CPU / processor management.
1845 Xen allows a domain's virtual CPU(s) to be associated with one or more
1846 host CPUs. This can be used to allocate real resources among one or
1847 more guests, or to make optimal use of processor resources when
1848 utilizing dual-core, hyperthreading, or other advanced CPU technologies.
1850 Xen enumerates physical CPUs in a `depth first' fashion. For a system
1851 with both hyperthreading and multiple cores, this would be all the
1852 hyperthreads on a given core, then all the cores on a given socket,
1853 and then all sockets. I.e. if you had a two socket, dual core,
1854 hyperthreaded Xeon the CPU order would be:
1857 \begin{center}
1858 \begin{tabular}{l|l|l|l|l|l|l|r}
1859 \multicolumn{4}{c|}{socket0} & \multicolumn{4}{c}{socket1} \\ \hline
1860 \multicolumn{2}{c|}{core0} & \multicolumn{2}{c|}{core1} &
1861 \multicolumn{2}{c|}{core0} & \multicolumn{2}{c}{core1} \\ \hline
1862 ht0 & ht1 & ht0 & ht1 & ht0 & ht1 & ht0 & ht1 \\
1863 \#0 & \#1 & \#2 & \#3 & \#4 & \#5 & \#6 & \#7 \\
1864 \end{tabular}
1865 \end{center}
1868 Having multiple vcpus belonging to the same domain mapped to the same
1869 physical CPU is very likely to lead to poor performance. It's better to
1870 use `vcpus-set' to hot-unplug one of the vcpus and ensure the others are
1871 pinned on different CPUs.
1873 If you are running IO intensive tasks, its typically better to dedicate
1874 either a hyperthread or whole core to running domain 0, and hence pin
1875 other domains so that they can't use CPU 0. If your workload is mostly
1876 compute intensive, you may want to pin vcpus such that all physical CPU
1877 threads are available for guest domains.
1879 \chapter{Migrating Domains}
1881 \section{Domain Save and Restore}
1883 The administrator of a Xen system may suspend a virtual machine's
1884 current state into a disk file in domain~0, allowing it to be resumed at
1885 a later time.
1887 For example you can suspend a domain called ``VM1'' to disk using the
1888 command:
1889 \begin{verbatim}
1890 # xm save VM1 VM1.chk
1891 \end{verbatim}
1893 This will stop the domain named ``VM1'' and save its current state
1894 into a file called \path{VM1.chk}.
1896 To resume execution of this domain, use the \path{xm restore} command:
1897 \begin{verbatim}
1898 # xm restore VM1.chk
1899 \end{verbatim}
1901 This will restore the state of the domain and resume its execution.
1902 The domain will carry on as before and the console may be reconnected
1903 using the \path{xm console} command, as described earlier.
1905 \section{Migration and Live Migration}
1907 Migration is used to transfer a domain between physical hosts. There
1908 are two varieties: regular and live migration. The former moves a
1909 virtual machine from one host to another by pausing it, copying its
1910 memory contents, and then resuming it on the destination. The latter
1911 performs the same logical functionality but without needing to pause
1912 the domain for the duration. In general when performing live migration
1913 the domain continues its usual activities and---from the user's
1914 perspective---the migration should be imperceptible.
1916 To perform a live migration, both hosts must be running Xen / \xend\ and
1917 the destination host must have sufficient resources (e.g.\ memory
1918 capacity) to accommodate the domain after the move. Furthermore we
1919 currently require both source and destination machines to be on the same
1920 L2 subnet.
1922 Currently, there is no support for providing automatic remote access
1923 to filesystems stored on local disk when a domain is migrated.
1924 Administrators should choose an appropriate storage solution (i.e.\
1925 SAN, NAS, etc.) to ensure that domain filesystems are also available
1926 on their destination node. GNBD is a good method for exporting a
1927 volume from one machine to another. iSCSI can do a similar job, but is
1928 more complex to set up.
1930 When a domain migrates, it's MAC and IP address move with it, thus it is
1931 only possible to migrate VMs within the same layer-2 network and IP
1932 subnet. If the destination node is on a different subnet, the
1933 administrator would need to manually configure a suitable etherip or IP
1934 tunnel in the domain~0 of the remote node.
1936 A domain may be migrated using the \path{xm migrate} command. To live
1937 migrate a domain to another machine, we would use the command:
1939 \begin{verbatim}
1940 # xm migrate --live mydomain destination.ournetwork.com
1941 \end{verbatim}
1943 Without the \path{--live} flag, \xend\ simply stops the domain and
1944 copies the memory image over to the new node and restarts it. Since
1945 domains can have large allocations this can be quite time consuming,
1946 even on a Gigabit network. With the \path{--live} flag \xend\ attempts
1947 to keep the domain running while the migration is in progress, resulting
1948 in typical down times of just 60--300ms.
1950 For now it will be necessary to reconnect to the domain's console on the
1951 new machine using the \path{xm console} command. If a migrated domain
1952 has any open network connections then they will be preserved, so SSH
1953 connections do not have this limitation.
1956 %% Chapter Securing Xen
1957 \chapter{Securing Xen}
1959 This chapter describes how to secure a Xen system. It describes a number
1960 of scenarios and provides a corresponding set of best practices. It
1961 begins with a section devoted to understanding the security implications
1962 of a Xen system.
1965 \section{Xen Security Considerations}
1967 When deploying a Xen system, one must be sure to secure the management
1968 domain (Domain-0) as much as possible. If the management domain is
1969 compromised, all other domains are also vulnerable. The following are a
1970 set of best practices for Domain-0:
1972 \begin{enumerate}
1973 \item \textbf{Run the smallest number of necessary services.} The less
1974 things that are present in a management partition, the better.
1975 Remember, a service running as root in the management domain has full
1976 access to all other domains on the system.
1977 \item \textbf{Use a firewall to restrict the traffic to the management
1978 domain.} A firewall with default-reject rules will help prevent
1979 attacks on the management domain.
1980 \item \textbf{Do not allow users to access Domain-0.} The Linux kernel
1981 has been known to have local-user root exploits. If you allow normal
1982 users to access Domain-0 (even as unprivileged users) you run the risk
1983 of a kernel exploit making all of your domains vulnerable.
1984 \end{enumerate}
1986 \section{Driver Domain Security Considerations}
1987 \label{s:ddsecurity}
1989 Driver domains address a range of security problems that exist regarding
1990 the use of device drivers and hardware. On many operating systems in common
1991 use today, device drivers run within the kernel with the same privileges as
1992 the kernel. Few or no mechanisms exist to protect the integrity of the kernel
1993 from a misbehaving (read "buggy") or malicious device driver. Driver
1994 domains exist to aid in isolating a device driver within its own virtual
1995 machine where it cannot affect the stability and integrity of other
1996 domains. If a driver crashes, the driver domain can be restarted rather than
1997 have the entire machine crash (and restart) with it. Drivers written by
1998 unknown or untrusted third-parties can be confined to an isolated space.
1999 Driver domains thus address a number of security and stability issues with
2000 device drivers.
2002 However, due to limitations in current hardware, a number of security
2003 concerns remain that need to be considered when setting up driver domains (it
2004 should be noted that the following list is not intended to be exhaustive).
2006 \begin{enumerate}
2007 \item \textbf{Without an IOMMU, a hardware device can DMA to memory regions
2008 outside of its controlling domain.} Architectures which do not have an
2009 IOMMU (e.g. most x86-based platforms) to restrict DMA usage by hardware
2010 are vulnerable. A hardware device which can perform arbitrary memory reads
2011 and writes can read/write outside of the memory of its controlling domain.
2012 A malicious or misbehaving domain could use a hardware device it controls
2013 to send data overwriting memory in another domain or to read arbitrary
2014 regions of memory in another domain.
2015 \item \textbf{Shared buses are vulnerable to sniffing.} Devices that share
2016 a data bus can sniff (and possible spoof) each others' data. Device A that
2017 is assigned to Domain A could eavesdrop on data being transmitted by
2018 Domain B to Device B and then relay that data back to Domain A.
2019 \item \textbf{Devices which share interrupt lines can either prevent the
2020 reception of that interrupt by the driver domain or can trigger the
2021 interrupt service routine of that guest needlessly.} A devices which shares
2022 a level-triggered interrupt (e.g. PCI devices) with another device can
2023 raise an interrupt and never clear it. This effectively blocks other devices
2024 which share that interrupt line from notifying their controlling driver
2025 domains that they need to be serviced. A device which shares an
2026 any type of interrupt line can trigger its interrupt continually which
2027 forces execution time to be spent (in multiple guests) in the interrupt
2028 service routine (potentially denying time to other processes within that
2029 guest). System architectures which allow each device to have its own
2030 interrupt line (e.g. PCI's Message Signaled Interrupts) are less
2031 vulnerable to this denial-of-service problem.
2032 \item \textbf{Devices may share the use of I/O memory address space.} Xen can
2033 only restrict access to a device's physical I/O resources at a certain
2034 granularity. For interrupt lines and I/O port address space, that
2035 granularity is very fine (per interrupt line and per I/O port). However,
2036 Xen can only restrict access to I/O memory address space on a page size
2037 basis. If more than one device shares use of a page in I/O memory address
2038 space, the domains to which those devices are assigned will be able to
2039 access the I/O memory address space of each other's devices.
2040 \end{enumerate}
2043 \section{Security Scenarios}
2046 \subsection{The Isolated Management Network}
2048 In this scenario, each node has two network cards in the cluster. One
2049 network card is connected to the outside world and one network card is a
2050 physically isolated management network specifically for Xen instances to
2051 use.
2053 As long as all of the management partitions are trusted equally, this is
2054 the most secure scenario. No additional configuration is needed other
2055 than forcing Xend to bind to the management interface for relocation.
2058 \subsection{A Subnet Behind a Firewall}
2060 In this scenario, each node has only one network card but the entire
2061 cluster sits behind a firewall. This firewall should do at least the
2062 following:
2064 \begin{enumerate}
2065 \item Prevent IP spoofing from outside of the subnet.
2066 \item Prevent access to the relocation port of any of the nodes in the
2067 cluster except from within the cluster.
2068 \end{enumerate}
2070 The following iptables rules can be used on each node to prevent
2071 migrations to that node from outside the subnet assuming the main
2072 firewall does not do this for you:
2074 \begin{verbatim}
2075 # this command disables all access to the Xen relocation
2076 # port:
2077 iptables -A INPUT -p tcp --destination-port 8002 -j REJECT
2079 # this command enables Xen relocations only from the specific
2080 # subnet:
2081 iptables -I INPUT -p tcp -{}-source \
2082 --destination-port 8002 -j ACCEPT
2083 \end{verbatim}
2085 \subsection{Nodes on an Untrusted Subnet}
2087 Migration on an untrusted subnet is not safe in current versions of Xen.
2088 It may be possible to perform migrations through a secure tunnel via an
2089 VPN or SSH. The only safe option in the absence of a secure tunnel is to
2090 disable migration completely. The easiest way to do this is with
2091 iptables:
2093 \begin{verbatim}
2094 # this command disables all access to the Xen relocation port
2095 iptables -A INPUT -p tcp -{}-destination-port 8002 -j REJECT
2096 \end{verbatim}
2098 %% Chapter Xen Mandatory Access Control Framework
2099 \chapter{sHype/Xen Access Control}
2101 The Xen mandatory access control framework is an implementation of the
2102 sHype Hypervisor Security Architecture
2103 (www.research.ibm.com/ssd\_shype). It permits or denies communication
2104 and resource access of domains based on a security policy. The
2105 mandatory access controls are enforced in addition to the Xen core
2106 controls, such as memory protection. They are designed to remain
2107 transparent during normal operation of domains (policy-conform
2108 behavior) but to intervene when domains move outside their intended
2109 sharing behavior. This chapter will describe how the sHype access
2110 controls in Xen can be configured to prevent viruses from spilling
2111 over from one into another workload type and secrets from leaking from
2112 one workload type to another. sHype/Xen depends on the correct
2113 behavior of Domain0 (cf previous chapter).
2115 Benefits of configuring sHype/ACM in Xen include:
2116 \begin{itemize}
2117 \item robust workload and resource protection effective against rogue
2118 user domains
2119 \item simple, platform- and operating system-independent security
2120 policies (ideal for heterogeneous distributed environments)
2121 \item safety net with minimal performance overhead in case operating
2122 system security is missing, does not scale, or fails
2123 \end{itemize}
2125 These benefits are very valuable because today's operating systems
2126 become increasingly complex and often have no or insufficient
2127 mandatory access controls. (Discretionary access controls, supported
2128 by of most operating systems, are not effective against viruses or
2129 misbehaving programs.) Where mandatory access control exists (e.g.,
2130 SELinux), they usually deploy complex and difficult to understand
2131 security policies. Additionally, multi-tier applications in business
2132 environments usually require different types of operating systems
2133 (e.g., AIX, Windows, Linux) which cannot be configured with compatible
2134 security policies. Related distributed transactions and workloads
2135 cannot be easily protected on the OS level. The Xen access control
2136 framework steps in to offer a coarse-grained but very robust security
2137 layer and safety net in case operating system security fails or is
2138 missing.
2140 To control sharing between domains, Xen mediates all inter-domain
2141 communication (shared memory, events) as well as the access of domains
2142 to resources such as disks. Thus, Xen can confine distributed
2143 workloads (domain payloads) by permitting sharing among domains
2144 running the same type of workload and denying sharing between pairs of
2145 domains that run different workload types. We assume that--from a Xen
2146 perspective--only one workload type is running per user domain. To
2147 enable Xen to associate domains and resources with workload types,
2148 security labels including the workload types are attached to domains
2149 and resources. These labels and the hypervisor sHype controls cannot
2150 be manipulated or bypassed and are effective even against rogue
2151 domains.
2153 \section{Overview}
2154 This section gives an overview of how workloads can be protected using
2155 the sHype mandatory access control framework in Xen.
2156 Figure~\ref{fig:acmoverview} shows the necessary steps in activating
2157 the Xen workload protection. These steps are described in detail in
2158 Section~\ref{section:acmexample}.
2160 \begin{figure}
2161 \centering
2162 \includegraphics[width=13cm]{figs/acm_overview.eps}
2163 \caption{Overview of activating sHype workload protection in Xen.
2164 Section numbers point to representative examples.}
2165 \label{fig:acmoverview}
2166 \end{figure}
2168 First, the sHype/ACM access control must be enabled in the Xen
2169 distribution and the distribution must be built and installed (cf
2170 Subsection~\ref{subsection:acmexampleconfigure}). Before we can
2171 enforce security, a Xen security policy must be created (cf
2172 Subsection~\ref{subsection:acmexamplecreate}) and deployed (cf
2173 Subsection~\ref{subsection:acmexampleinstall}). This policy defines
2174 the workload types differentiated during access control. It also
2175 defines the rules that compare workload types of domains and resources
2176 to provide access decisions. Workload types are represented by
2177 security labels that can be attached to domains and resources (cf
2178 Subsections~\ref{subsection:acmexamplelabeldomains}
2179 and~\ref{subsection:acmexamplelabelresources}). The functioning of
2180 the active sHype/Xen workload protection is demonstrated using simple
2181 resource assignment, and domain creation tests in
2182 Subsection~\ref{subsection:acmexampletest}.
2183 Section~\ref{section:acmpolicy} describes the syntax and semantics of
2184 the sHype/Xen security policy in detail and introduces briefly the
2185 tools that are available to help create valid security policies.
2187 The next section describes all the necessary steps to create, deploy,
2188 and test a simple workload protection policy. It is meant to enable
2189 anybody to quickly try out the sHype/Xen workload protection. Those
2190 readers who are interested in learning more about how the sHype access
2191 control in Xen works and how it is configured using the XML security
2192 policy should read Section~\ref{section:acmpolicy} as well.
2193 Section~\ref{section:acmlimitations} concludes this chapter with
2194 current limitations of the sHype implementation for Xen.
2196 \section{Xen Workload Protection Step-by-Step}
2197 \label{section:acmexample}
2199 What you are about to do consists of the following sequence:
2200 \begin{itemize}
2201 \item configure and install sHype/Xen
2202 \item create a simple workload protection security policy
2203 \item deploy the sHype/Xen security policy
2204 \item associate domains and resources with workload labels,
2205 \item test the workload protection
2206 \end{itemize}
2207 The essential commands to create and deploy a sHype/Xen security
2208 policy are numbered throughout the following sections. If you want a
2209 quick-guide or return at a later time to go quickly through this
2210 demonstration, simply look for the numbered commands and apply them in
2211 order.
2213 \subsection{Configuring/Building sHype Support into Xen}
2214 \label{subsection:acmexampleconfigure}
2215 First, we need to configure the access control module in Xen and
2216 install the ACM-enabled Xen hypervisor. This step installs security
2217 tools and compiles sHype/ACM controls into the Xen hypervisor.
2219 To enable sHype/ACM in Xen, please edit the Config.mk file in the top
2220 Xen directory.
2222 \begin{verbatim}
2223 (1) In Config.mk
2224 Change: ACM_SECURITY ?= n
2225 To: ACM_SECURITY ?= y
2226 \end{verbatim}
2228 Then install the security-enabled Xen environment as follows:
2230 \begin{verbatim}
2231 (2) # make world
2232 # make install
2233 \end{verbatim}
2235 \subsection{Creating A WLP Policy in 3 Simple Steps with ezPolicy}
2236 \label{subsection:acmexamplecreate}
2238 We will use the ezPolicy tool to quickly create a policy that protects
2239 workloads. You will need both the Python and wxPython packages to run
2240 this tool. To run the tool in Domain0, you can download the wxPython
2241 package from www.wxpython.org or use the command
2242 \verb|yum install wxPython| in Redhat/Fedora. To run the tool on MS
2243 Windows, you also need to download the Python package from
2244 www.python.org. After these packages are installed, start the ezPolicy
2245 tool with the following command:
2247 \begin{verbatim}
2248 (3) # xensec_ezpolicy
2249 \end{verbatim}
2251 Figure~\ref{fig:acmezpolicy} shows a screen-shot of the tool. The
2252 following steps show you how to create the policy shown in
2253 Figure~\ref{fig:acmezpolicy}. You can use \verb|<CTRL>-h| to pop up a
2254 help window at any time. The indicators (a), (b), and (c) in
2255 Figure~\ref{fig:acmezpolicy} show the buttons that are used during the
2256 3 steps of creating a policy:
2257 \begin{enumerate}
2258 \item defining workloads
2259 \item defining run-time conflicts
2260 \item translating the workload definition into a sHype/Xen access
2261 control policy
2262 \end{enumerate}
2264 \paragraph{Defining workloads.} Workloads are defined for each
2265 organization and department that you enter in the left panel. Please
2266 use the ``New Org'' button (a) to create the organizations ``Avis'',
2267 ``Hertz'', ``CocaCola'', and ``PepsiCo''.
2269 You can refine an organization to differentiate between multiple
2270 department workloads by right-clicking the organization and selecting
2271 \verb|Add Department| (or selecting an organization and pressing
2272 \verb|<CRTL>-a|). Create department workloads ``Intranet'',
2273 ``Extranet'', ``HumanResources'', and ``Payroll'' for the ``CocaCola''
2274 organization and department workloads ``Intranet'' and ``Extranet''
2275 for the ``PepsiCo'' organization. The resulting layout of the tool
2276 should be similar to the left panel shown in
2277 Figure~\ref{fig:acmezpolicy}.
2279 \paragraph{Defining run-time conflicts.} Workloads that shall be
2280 prohibited from running concurrently on the same hypervisor platform
2281 are grouped into ``Run-time Exclusion rules'' on the right panel of
2282 the window.
2284 To prevent PepsiCo and CocaCola workloads (including their
2285 departmental workloads) from running simultaneously on the same
2286 hypervisor system, select the organization ``PepsiCo'' and, while
2287 pressing the \verb|<CTRL>|-key, select the organization ``CocaCola''.
2288 Now press the button (b) named ``Create run-time exclusion rule from
2289 selection''. A popup window will ask for the name for this run-time
2290 exclusion rule (enter a name or just hit \verb|<ENTER>|). A rule will
2291 appear on the right panel. The name is used as reference only and does
2292 not affect the hypervisor policy.
2294 Repeat the process to create a run-time exclusion rule just for the
2295 department workloads CocaCola.Extranet and CocaCola.Payroll.
2297 \begin{figure}[htb]
2298 \centering
2299 \includegraphics[width=13cm]{figs/acm_ezpolicy.eps}
2300 \caption{Final layout including workload definition and Run-time Exclusion rules.}
2301 \label{fig:acmezpolicy}
2302 \end{figure}
2304 The resulting layout of your window should be similar to
2305 Figure~\ref{fig:acmezpolicy}. Save this workload definition by
2306 selecting ``Save Workload Definition as ...'' in the ``File'' menu
2307 (c). This workload definition can be later refined if required.
2309 \paragraph{Translating the workload definition into a sHype/Xen access
2310 control policy.} To translate the workload definition into a access
2311 control policy understood by Xen, please select the ``Save as Xen ACM
2312 Security Policy'' in the ``File'' menu (c). Enter the following policy
2313 name in the popup window: \verb|example.chwall_ste.test-wld|. If you
2314 are running ezPolicy in Domain0, the resulting policy file
2315 test-wld\_security-policy.xml will automatically be placed into the
2316 right directory (/etc/xen/acm-security/ policies/example/chwall\_ste).
2317 If you run the tool on another system, then you need to copy the
2318 resulting policy file into Domain0 before continuing. See
2319 Section~\ref{subsection:acmnaming} for naming conventions of security
2320 policies.
2322 \subsection{Deploying a WLP Policy}
2323 \label{subsection:acmexampleinstall}
2324 To deploy the workload protection policy we created in
2325 Section~\ref{subsection:acmexamplecreate}, we create a policy
2326 representation (test-wld.bin) that can be loaded into the Xen
2327 hypervisor and we configure Xen to actually load this policy at
2328 startup time.
2330 The following command translates the source policy representation
2331 into a format that can be loaded into Xen with sHype/ACM support.
2332 Refer to the \verb|xm| man page for further details:
2334 \begin{verbatim}
2335 (4) # xm makepolicy example.chwall_ste.test-wld
2336 \end{verbatim}
2338 The easiest way to install a security policy for Xen is to include the
2339 policy in the boot sequence. The following command does just this:
2341 \begin{verbatim}
2342 (5) # xm cfgbootpolicy example.chwall_ste.test-wld
2343 \end{verbatim}
2345 \textit{Alternatively, if this command fails} (e.g., because it cannot
2346 identify the Xen boot entry), you can manually install the policy in 2
2347 steps. First, manually copy the policy binary file into the boot
2348 directory:
2350 \begin{scriptsize}
2351 \begin{verbatim}
2352 # cp /etc/xen/acm-security/policies/example/chwall_ste/test-wld.bin \
2353 /boot/example.chwall_ste.test-wld.bin
2354 \end{verbatim}
2355 \end{scriptsize}
2357 Second, manually add a module line to your Xen boot entry so that grub
2358 loads this policy file during startup:
2360 \begin{scriptsize}
2361 \begin{verbatim}
2362 title Xen (
2363 root (hd0,0)
2364 kernel /xen.gz dom0_mem=2000000 console=vga
2365 module /vmlinuz- ro root=/dev/hda3
2366 module /initrd-
2367 module /example.chwall_ste.test-wld.bin
2368 \end{verbatim}
2369 \end{scriptsize}
2371 Now reboot into this Xen boot entry to activate the policy and the
2372 security-enabled Xen hypervisor.
2374 \begin{verbatim}
2375 (6) # reboot
2376 \end{verbatim}
2378 After reboot, check if security is enabled:
2380 \begin{scriptsize}
2381 \begin{verbatim}
2382 # xm list --label
2383 Name ID Mem(MiB) VCPUs State Time(s) Label
2384 Domain-0 0 1949 4 r----- 163.9 SystemManagement
2385 \end{verbatim}
2386 \end{scriptsize}
2388 If the security label at the end of the line says ``INACTIV'' then the
2389 security is not enabled. Verify the previous steps. Note: Domain0 is
2390 assigned a default label (see \verb|bootstrap| policy attribute
2391 explained in Section~\ref{section:acmpolicy}). All other domains must
2392 be labeled in order to start on this sHype/ACM-enabled Xen hypervisor
2393 (see following sections for labeling domains and resources).
2395 \subsection{Labeling Domains}
2396 \label{subsection:acmexamplelabeldomains}
2397 You should have a Xen domain configuration file that looks like the
2398 following (Note: www.jailtime.org or www.xen-get.org might be good
2399 places to look for example domU images). The following configuration
2400 file defines \verb|domain1|:
2402 \begin{scriptsize}
2403 \begin{verbatim}
2404 # cat domain1.xm
2405 kernel = "/boot/vmlinuz-"
2406 memory = 128
2407 name = "domain1"
2408 vif = [ '' ]
2409 dhcp = "dhcp"
2410 disk = ['file:/home/xen/dom_fc5/fedora.fc5.img,sda1,w', \
2411 'file:/home/xen/dom_fc5/fedora.swap,sda2,w']
2412 root = "/dev/sda1 ro"
2413 \end{verbatim}
2414 \end{scriptsize}
2416 If you try to start domain1, you will get the following error:
2418 \begin{scriptsize}
2419 \begin{verbatim}
2420 # xm create domain1.xm
2421 Using config file "domain1.xm".
2422 domain1: DENIED
2423 --> Domain not labeled
2424 Checking resources: (skipped)
2425 Security configuration prevents domain from starting
2426 \end{verbatim}
2427 \end{scriptsize}
2429 Every domain must be associated with a security label before it can
2430 start on sHype/Xen. Otherwise, sHype/Xen would not be able to enforce
2431 the policy consistently. The following command prints all domain
2432 labels available in the active policy:
2434 \begin{scriptsize}
2435 \begin{verbatim}
2436 # xm labels type=dom
2437 Avis
2438 CocaCola
2439 CocaCola.Extranet
2440 CocaCola.HumanResources
2441 CocaCola.Intranet
2442 CocaCola.Payroll
2443 Hertz
2444 PepsiCo
2445 PepsiCo.Extranet
2446 PepsiCo.Intranet
2447 SystemManagement
2448 \end{verbatim}
2449 \end{scriptsize}
2451 Now label domain1 with the CocaCola label and another domain2 with the
2452 PepsiCo.Extranet label. Please refer to the xm man page for further
2453 information.
2455 \begin{verbatim}
2456 (7) # xm addlabel CocaCola dom domain1.xm
2457 # xm addlabel PepsiCo.Extranet dom domain2.xm
2458 \end{verbatim}
2460 Let us try to start the domain again:
2462 \begin{scriptsize}
2463 \begin{verbatim}
2464 # xm create domain1.xm
2465 Using config file "domain1.xm".
2466 file:/home/xen/dom_fc5/fedora.fc5.img: DENIED
2467 --> res:__NULL_LABEL__ (NULL)
2468 --> dom:CocaCola (example.chwall_ste.test-wld)
2469 file:/home/xen/dom_fc5/fedora.swap: DENIED
2470 --> res:__NULL_LABEL__ (NULL)
2471 --> dom:CocaCola (example.chwall_ste.test-wld)
2472 Security configuration prevents domain from starting
2473 \end{verbatim}
2474 \end{scriptsize}
2476 This error indicates that domain1, if started, would not be able to
2477 access its image and swap files because they are not labeled. This
2478 makes sense because to confine workloads, access of domains to
2479 resources must be controlled. Otherwise, domains that are not allowed
2480 to communicate or run simultaneously could share data through storage
2481 resources.
2483 \subsection{Labeling Resources}
2484 \label{subsection:acmexamplelabelresources}
2485 You can use the \verb|xm labels type=res| command to list available
2486 resource labels. Let us assign the CocaCola resource label to the domain1
2487 image file representing \verb|/dev/sda1| and to its swap file:
2489 \begin{verbatim}
2490 (8) # xm addlabel CocaCola res \
2491 file:/home/xen/dom_fc5/fedora.fc5.img
2492 Resource file not found, creating new file at:
2493 /etc/xen/acm-security/policies/resource_labels
2494 # xm addlabel CocaCola res \
2495 file:/home/xen/dom_fc5/fedora.swap
2496 \end{verbatim}
2498 Starting \verb|domain1| now will succeed:
2500 \begin{scriptsize}
2501 \begin{verbatim}
2502 # xm create domain1.xm
2503 # xm list --label
2504 Name ID Mem(MiB) VCPUs State Time(s) Label
2505 domain1 1 128 1 r----- 2.8 CocaCola
2506 Domain-0 0 1949 4 r----- 387.7 SystemManagement
2507 \end{verbatim}
2508 \end{scriptsize}
2510 The following command lists all labeled resources on the
2511 system, e.g., to lookup or verify the labeling:
2513 \begin{scriptsize}
2514 \begin{verbatim}
2515 # xm resources
2516 file:/home/xen/dom_fc5/fedora.swap
2517 policy: example.chwall_ste.test-wld
2518 label: CocaCola
2519 file:/home/xen/dom_fc5/fedora.fc5.img
2520 policy: example.chwall_ste.test-wld
2521 label: CocaCola
2522 \end{verbatim}
2523 \end{scriptsize}
2525 Currently, if a labeled resource is moved to another location, the
2526 label must first be manually removed, and after the move re-attached
2527 using the xm commands \verb|xm rmlabel| and \verb|xm addlabel|
2528 respectively. Please see Section~\ref{section:acmlimitations} for
2529 further details.
2531 \begin{verbatim}
2532 (9) Label the resources of domain2 as PepsiCo.Extranet
2533 Do not try to start this domain yet
2534 \end{verbatim}
2536 \subsection{Testing The Xen Workload Protection}
2537 \label{subsection:acmexampletest}
2538 We are about to demonstrate how the workload protection works by
2539 verifying:
2540 \begin{itemize}
2541 \item that domains with conflicting workloads cannot run
2542 simultaneously
2543 \item that domains cannot access resources of other workloads
2544 \item that domains cannot exchange network packets if they are not
2545 associated with the same workload type
2546 \end{itemize}
2548 \paragraph{Test 1: Run-time exclusion rules.} We assume that domain1
2549 with the CocaCola label is still running. While domain1 is running,
2550 the run-time exclusion set of our policy says that domain2 cannot
2551 start because the label of domain1 includes the CHWALL type CocaCola
2552 and the label of domain2 includes the CHWALL type PepsiCo. The
2553 run-time exclusion rule of our policy enforces that PepsiCo and
2554 CocaCola cannot run at the same time on the same hypervisor platform.
2555 Once domain1 is stopped or saved, domain2 can start but domain1 can no
2556 longer start or be resumed. The ezPolicy tool, when creating the
2557 Chinese Wall types for the workload labels, ensures that department
2558 workloads inherit the organization type (and with it any organization
2559 exclusions).
2561 \begin{scriptsize}
2562 \begin{verbatim}
2563 # xm list --label
2564 Name ID Mem(MiB) VCPUs State Time(s) Label
2565 domain1 2 128 1 -b---- 6.9 CocaCola
2566 Domain-0 0 1949 4 r----- 273.1 SystemManagement
2568 # xm create domain2.xm
2569 Using config file "domain2.xm".
2570 Error: (1, 'Operation not permitted')
2572 # xm destroy domain1
2573 # xm create domain2.xm
2574 Using config file "domain2.xm".
2575 Started domain domain2
2577 # xm list --label
2578 Name ID Mem(MiB) VCPUs State Time(s) Label
2579 domain2 4 164 1 r----- 4.3 PepsiCo.Extranet
2580 Domain-0 0 1949 4 r----- 298.4 SystemManagement
2582 # xm create domain1.xm
2583 Using config file "domain1.xm".
2584 Error: (1, 'Operation not permitted')
2586 # xm destroy domain2
2587 # xm list
2588 Name ID Mem(MiB) VCPUs State Time(s)
2589 Domain-0 0 1949 4 r----- 391.2
2590 \end{verbatim}
2591 \end{scriptsize}
2593 You can verify that domains with Avis label can run together with
2594 domains labeled CocaCola, PepsiCo, or Hertz.
2596 \paragraph{Test2: Resource access.} In this test, we will re-label the
2597 swap file for domain1 with the Avis resource label. We expect that
2598 Domain1 will no longer start because it cannot access this resource.
2599 This test checks the sharing abilities of domains, which are defined
2600 by the Simple Type Enforcement Policy component.
2602 \begin{scriptsize}
2603 \begin{verbatim}
2604 # xm rmlabel res file:/home/xen/dom_fc5/fedora.swap
2605 # xm addlabel Avis res file:/home/xen/dom_fc5/fedora.swap
2606 # xm resources
2607 file:/home/xen/dom_fc5/fedora.swap
2608 policy: example.chwall_ste.test-wld
2609 label: Avis
2610 file:/home/xen/dom_fc5/fedora.fc5.img
2611 policy: example.chwall_ste.test-wld
2612 label: CocaCola
2614 # xm create domain1.xm
2615 Using config file "domain1.xm".
2616 file:/home/xen/dom_fc4/fedora.swap: DENIED
2617 --> res:Avis (example.chwall_ste.test-wld)
2618 --> dom:CocaCola (example.chwall_ste.test-wld)
2619 Security configuration prevents domain from starting
2620 \end{verbatim}
2621 \end{scriptsize}
2623 \paragraph{Test 3: Communication.} In this test we would verify that
2624 two domains with labels Hertz and Avis cannot exchange network packets
2625 by using the 'ping' connectivity test. It is also related to the STE
2626 policy.{\bf Note:} sHype/Xen does control direct communication between
2627 domains. However, domains associated with different workloads can
2628 currently still communicate through the Domain0 virtual network. We
2629 are working on the sHype/ACM controls for local and remote network
2630 traffic through Domain0. Please monitor the xen-devel mailing list
2631 for updated information.
2633 \section{Xen Access Control Policy}
2634 \label{section:acmpolicy}
2636 This section describes the sHype/Xen access control policy in detail.
2637 It gives enough information to enable the reader to write custom
2638 access control policies and to use the available Xen policy tools. The
2639 policy language is expressive enough to specify most symmetric access
2640 relationships between domains and resources efficiently.
2642 The Xen access control policy consists of two policy components. The
2643 first component, called Chinese Wall (CHWALL) policy, controls which
2644 domains can run simultaneously on the same virtualized platform. The
2645 second component, called Simple Type Enforcement (STE) policy,
2646 controls the sharing between running domains, i.e., communication or
2647 access to shared resources. The CHWALL and STE policy components can
2648 be configured to run alone, however in our examples we will assume
2649 that both policy components are configured together since they
2650 complement each other. The XML policy file includes all information
2651 needed by Xen to enforce the policies.
2653 Figures~\ref{fig:acmxmlfilea} and \ref{fig:acmxmlfileb} show a fully
2654 functional but very simple example policy for Xen. The policy can
2655 distinguish two workload types \verb|CocaCola| and \verb|PepsiCo| and
2656 defines the labels necessary to associate domains and resources with
2657 one of these workload types. The XML Policy consists of four parts:
2658 \begin{enumerate}
2659 \item policy header including the policy name
2660 \item Simple Type Enforcement block
2661 \item Chinese Wall Policy block
2662 \item label definition block
2663 \end{enumerate}
2665 \begin{figure}
2666 \begin{scriptsize}
2667 \begin{verbatim}
2668 01 <?xml version="1.0" encoding="UTF-8"?>
2669 02 <!-- Auto-generated by ezPolicy -->
2670 03 <SecurityPolicyDefinition
2671 xmlns="http://www.ibm.com"
2672 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2673 xsi:schemaLocation=
2674 "http://www.ibm.com ../../security_policy.xsd ">
2675 04 <PolicyHeader>
2676 05 <PolicyName>example.test</PolicyName>
2677 06 <Date>Wed Jul 12 17:32:59 2006</Date>
2678 07 <Version>1.0</Version>
2679 08 </PolicyHeader>
2680 09
2681 10 <SimpleTypeEnforcement>
2682 11 <SimpleTypeEnforcementTypes>
2683 12 <Type>SystemManagement</Type>
2684 13 <Type>PepsiCo</Type>
2685 14 <Type>CocaCola</Type>
2686 15 </SimpleTypeEnforcementTypes>
2687 16 </SimpleTypeEnforcement>
2688 17
2689 18 <ChineseWall priority="PrimaryPolicyComponent">
2690 19 <ChineseWallTypes>
2691 20 <Type>SystemManagement</Type>
2692 21 <Type>PepsiCo</Type>
2693 22 <Type>CocaCola</Type>
2694 23 </ChineseWallTypes>
2695 24
2696 25 <ConflictSets>
2697 26 <Conflict name="RER1">
2698 27 <Type>CocaCola</Type>
2699 28 <Type>PepsiCo</Type>
2700 29 </Conflict>
2701 30 </ConflictSets>
2702 31 </ChineseWall>
2703 32
2704 \end{verbatim}
2705 \end{scriptsize}
2706 \caption{Example XML security policy file -- Part I: Types and Rules Definition.}
2707 \label{fig:acmxmlfilea}
2708 \end{figure}
2710 \subsection{Policy Header and Policy Name}
2711 \label{subsection:acmnaming}
2712 Lines 1-2 (cf Figure~\ref{fig:acmxmlfilea}) include the usual XML
2713 header. The security policy definition starts in Line 3 and refers to
2714 the policy schema. The XML-Schema definition for the Xen policy can be
2715 found in the file
2716 \textit{/etc/xen/acm-security/policies/security-policy.xsd}. Examples
2717 for security policies can be found in the example subdirectory. The
2718 acm-security directory is only installed if ACM security is configured
2719 during installation (cf Section~\ref{subsection:acmexampleconfigure}).
2721 The \verb|Policy Header| spans lines 4-7. It includes a date field and
2722 defines the policy name \verb|example.chwall_ste.test|. It can also
2723 include optional fields that are not shown and are for future use (see
2724 schema definition).
2726 The policy name serves two purposes: First, it provides a unique name
2727 for the security policy. This name is also exported by the Xen
2728 hypervisor to the Xen management tools in order to ensure that both
2729 enforce the same policy. We plan to extend the policy name with a
2730 digital fingerprint of the policy contents to better protect this
2731 correlation. Second, it implicitly points the xm tools to the
2732 location where the XML policy file is stored on the Xen system.
2733 Replacing the colons in the policy name by slashes yields the local
2734 path to the policy file starting from the global policy directory
2735 \verb|/etc/xen/acm-security/policies|. The last part of the policy
2736 name is the prefix for the XML policy file name, completed by
2737 \verb|-security_policy.xml|. Consequently, the policy with the name
2738 \verb|example.chwall_ste.test| can be found in the XML policy file
2739 named \verb|test-security_policy.xml| that is stored in the local
2740 directory \verb|example/chwall_ste| under the global policy directory.
2742 \subsection{Simple Type Enforcement Policy Component}
2744 The Simple Type Enforcement (STE) policy controls which domains can
2745 communicate or share resources. This way, Xen can enforce confinement
2746 of workload types by confining the domains running those workload
2747 types. The mandatory access control framework enforces its policy when
2748 domains access intended ways of communication or cooperation (shared
2749 memory, events, shared resources such as block devices). It builds on
2750 top of the core hypervisor isolation, which restricts the ways of
2751 inter-communication to those intended means. STE does not protect or
2752 intend to protect from covert channels in the hypervisor or hardware;
2753 this is an orthogonal problem that can be mitigated by using the
2754 Run-time Exclusion rules described above or by fixing the problem in
2755 the core hypervisor.
2757 Xen controls sharing between domains on the resource and domain level
2758 because this is the abstraction the hypervisor and its management
2759 understand naturally. While this is coarse-grained, it is also very
2760 reliable and robust and it requires minimal changes to implement
2761 mandatory access controls in the hypervisor. It enables platform- and
2762 operation system-independent policies as part of a layered security
2763 approach.
2765 Lines 9-15 (cf Figure~\ref{fig:acmxmlfilea}) define the Simple Type
2766 Enforcement policy component. Essentially, they define the workload
2767 type names \verb|SystemManagement|, \verb|PepsiCo|, and
2768 \verb|CocaCola| that are available in the STE policy component. The
2769 policy rules are implicit: Xen permits a domain to communicate with
2770 another domain if and only if the labels of the domains share an
2771 common STE type. Xen permits a domain to access a resource if and
2772 only if the labels of the domain and the resource share a common STE
2773 workload type.
2775 \subsection{Chinese Wall Policy Component}
2777 The Chinese Wall security policy interpretation of sHype enables users
2778 to prevent certain workloads from running simultaneously on the same
2779 hypervisor platform. Run-time Exclusion rules (RER), also called
2780 Conflict Sets, define a set of workload types that are not permitted
2781 to run simultaneously. Of all the workloads specified in a Run-time
2782 Exclusion rule, at most one type can run on the same hypervisor
2783 platform at a time. Run-time Exclusion Rules implement a less
2784 rigorous variant of the original Chinese Wall security component. They
2785 do not implement the *-property of the policy, which would require to
2786 restrict also types that are not part of an exclusion rule once they
2787 are running together with a type in an exclusion rule (please refer to
2788 http://www.gammassl.co.uk/topics/chinesewall.html for more information
2789 on the original Chinese Wall policy).
2791 Xen considers the \verb|ChineseWallTypes| part of the label for the
2792 enforcement of the Run-time Exclusion rules. It is illegal to define
2793 labels including conflicting Chinese Wall types.
2795 Lines 17-30 (cf Figure~\ref{fig:acmxmlfilea}) define the Chinese Wall
2796 policy component. Lines 17-22 define the known Chinese Wall types,
2797 which coincide here with the STE types defined above. This usually
2798 holds if the criteria for sharing among domains and sharing of the
2799 hardware platform are the same. Lines 24-29 define one Run-time
2800 Exclusion rule:
2802 \begin{scriptsize}
2803 \begin{verbatim}
2804 <Conflict name="RER1">
2805 <Type>CocaCola</Type>
2806 <Type>PepsiCo</Type>
2807 </Conflict>
2808 \end{verbatim}
2809 \end{scriptsize}
2811 Based on this rule, Xen enforces that only one of the types
2812 \verb|CocaCola| or \verb|PepsiCo| will run on a single hypervisor
2813 platform at a time. For example, once a domain assigned a
2814 \verb|CocaCola| workload type is started, domains with the
2815 \verb|PepsiCo| type will be denied to start. When the former domain
2816 stops and no other domains with the \verb|CocaCola| type are running,
2817 then domains with the \verb|PepsiCo| type can start.
2819 Xen maintains reference counts on each running workload type to keep
2820 track of which workload types are running. Every time a domain starts
2821 or resumes, the reference count on those Chinese Wall types that are
2822 referenced in the domain's label are incremented. Every time a domain
2823 is destroyed or saved, the reference counts of its Chinese Wall types
2824 are decremented. sHype in Xen covers migration and live-migration,
2825 which is treated the same way as saving a domain on the source
2826 platform and resuming it on the destination platform.
2828 Reasons why users would want to restrict which workloads or domains
2829 can share the system hardware include:
2831 \begin{itemize}
2832 \item Imperfect resource management or control might enable a rogue
2833 domain to starve another domain and the workload running in it.
2834 \item Redundant domains might run the same workload to increase
2835 availability; such domains should not run on the same hardware to
2836 avoid single points of failure.
2837 \item Imperfect Xen core domain isolation might enable two rogue
2838 domains running different workload types to use unintended and
2839 unknown ways (covert channels) to exchange some data. This way, they
2840 bypass the policed Xen access control mechanisms. Such
2841 imperfections cannot be completely eliminated and are a result of
2842 trade-offs between security and other design requirements. For a
2843 simple example of a covert channel see
2844 http://www.multicians.org/timing-chn.html. Such covert channels
2845 exist also between workloads running on different platforms if they
2846 are connected through networks. The Xen Chinese Wall policy provides
2847 an approximation of this imperfect ``air-gap'' between selected
2848 workload types.
2849 \end{itemize}
2851 \subsection{Security Labels}
2853 To enable Xen to associate domains with workload types running in
2854 them, each domain is assigned a security label that includes the
2855 workload types of the domain.
2857 \begin{figure}
2858 \begin{scriptsize}
2859 \begin{verbatim}
2860 32 <SecurityLabelTemplate>
2861 33 <SubjectLabels bootstrap="SystemManagement">
2862 34 <VirtualMachineLabel>
2863 35 <Name>SystemManagement</Name>
2864 36 <SimpleTypeEnforcementTypes>
2865 37 <Type>SystemManagement</Type>
2866 38 <Type>PepsiCo</Type>
2867 39 <Type>CocaCola</Type>
2868 40 </SimpleTypeEnforcementTypes>
2869 41 <ChineseWallTypes>
2870 42 <Type>SystemManagement</Type>
2871 43 </ChineseWallTypes>
2872 44 </VirtualMachineLabel>
2873 45
2874 46 <VirtualMachineLabel>
2875 47 <Name>PepsiCo</Name>
2876 48 <SimpleTypeEnforcementTypes>
2877 49 <Type>PepsiCo</Type>
2878 50 </SimpleTypeEnforcementTypes>
2879 51 <ChineseWallTypes>
2880 52 <Type>PepsiCo</Type>
2881 53 </ChineseWallTypes>
2882 54 </VirtualMachineLabel>
2883 55
2884 56 <VirtualMachineLabel>
2885 57 <Name>CocaCola</Name>
2886 58 <SimpleTypeEnforcementTypes>
2887 59 <Type>CocaCola</Type>
2888 60 </SimpleTypeEnforcementTypes>
2889 61 <ChineseWallTypes>
2890 62 <Type>CocaCola</Type>
2891 63 </ChineseWallTypes>
2892 64 </VirtualMachineLabel>
2893 65 </SubjectLabels>
2894 66
2895 67 <ObjectLabels>
2896 68 <ResourceLabel>
2897 69 <Name>SystemManagement</Name>
2898 70 <SimpleTypeEnforcementTypes>
2899 71 <Type>SystemManagement</Type>
2900 72 </SimpleTypeEnforcementTypes>
2901 73 </ResourceLabel>
2902 74
2903 75 <ResourceLabel>
2904 76 <Name>PepsiCo</Name>
2905 77 <SimpleTypeEnforcementTypes>
2906 78 <Type>PepsiCo</Type>
2907 79 </SimpleTypeEnforcementTypes>
2908 80 </ResourceLabel>
2909 81
2910 82 <ResourceLabel>
2911 83 <Name>CocaCola</Name>
2912 84 <SimpleTypeEnforcementTypes>
2913 85 <Type>CocaCola</Type>
2914 86 </SimpleTypeEnforcementTypes>
2915 87 </ResourceLabel>
2916 88 </ObjectLabels>
2917 89 </SecurityLabelTemplate>
2918 90 </SecurityPolicyDefinition>
2919 \end{verbatim}
2920 \end{scriptsize}
2921 \caption{Example XML security policy file -- Part II: Label Definition.}
2922 \label{fig:acmxmlfileb}
2923 \end{figure}
2925 Lines 32-89 (cf Figure~\ref{fig:acmxmlfileb}) define the
2926 \verb|SecurityLabelTemplate|, which includes the labels that can be
2927 attached to domains and resources when this policy is active. The
2928 domain labels include Chinese Wall types while resource labels do not
2929 include Chinese Wall types. Lines 33-65 define the
2930 \verb|SubjectLabels| that can be assigned to domains. For example, the
2931 virtual machine label \verb|CocaCola| (cf lines 56-64 in
2932 Figure~\ref{fig:acmxmlfileb}) associates the domain that carries it
2933 with the workload type \verb|CocaCola|.
2935 The \verb|bootstrap| attribute names the label
2936 \verb|SystemManagement|. Xen will assign this label to Domain0 at
2937 boot time. All other domains are assigned labels according to their
2938 domain configuration file (see
2939 Section~\ref{subsection:acmexamplelabeldomains} for examples of how to
2940 label domains). Lines 67-88 define the \verb|ObjectLabels|. Those
2941 labels can be assigned to resources when this policy is active.
2943 In general, user domains should be assigned labels that have only a
2944 single SimpleTypeEnforcement workload type. This way, workloads remain
2945 confined even if user domains become rogue. Any domain that is
2946 assigned a label with multiple STE types must be trusted to keep
2947 information belonging to the different STE types separate (confined).
2948 For example, Domain0 is assigned the bootstrap label
2949 \verb|SystemsManagement|, which includes all existing STE types.
2950 Therefore, Domain0 must take care not to enable unauthorized
2951 information flow (eg. through block devices or virtual networking)
2952 between domains or resources that are assigned different STE types.
2954 Security administrators simply use the name of a label (specified in
2955 the \verb|<Name>| field) to associate a label with a domain (cf.
2956 Section~\ref{subsection:acmexamplelabeldomains}). The types inside the
2957 label are used by the Xen access control enforcement. While the name
2958 can be arbitrarily chosen (as long as it is unique), it is advisable
2959 to choose the label name in accordance to the security types included.
2960 While the XML representation in the above label seems unnecessary
2961 flexible, labels in general can consist of multiple types as we will
2962 see in the following example.
2964 Assume that \verb|PepsiCo| and \verb|CocaCola| workloads use virtual
2965 disks that are provided by a virtual I/O domain hosting a physical
2966 storage device and carrying the following label:
2968 \begin{scriptsize}
2969 \begin{verbatim}
2970 <VirtualMachineLabel>
2971 <Name>VIO</Name>
2972 <SimpleTypeEnforcementTypes>
2973 <Type>CocaCola</Type>
2974 <Type>PepsiCo</Type>
2975 </SimpleTypeEnforcementTypes>
2976 <ChineseWallTypes>
2977 <Type>VIOServer</Type>
2978 </ChineseWallTypes>
2979 </VirtualMachineLabel>
2980 \end{verbatim}
2981 \end{scriptsize}
2983 This Virtual I/O domain (VIO) exports its virtualized disks by
2984 communicating both to domains labeled with the \verb|PepsiCo| label
2985 and domains labeled with the \verb|CocaCola| label. This requires the
2986 VIO domain to carry both the STE types \verb|CocaCola| and
2987 \verb|PepsiCo|. In this example, the confinement of \verb|CocaCola|
2988 and \verb|PepsiCo| workload depends on a VIO domain that must keep the
2989 data of those different workloads separate. The virtual disks are
2990 labeled as well (see Section~\ref{subsection:acmexamplelabelresources}
2991 for labeling resources) and enforcement functions inside the VIO
2992 domain must ensure that the labels of the domain mounting a virtual
2993 disk and the virtual disk label share a common STE type. The VIO label
2994 carrying its own VIOServer CHWALL type introduces the flexibility to
2995 permit the trusted VIO server to run together with CocaCola or PepsiCo
2996 workloads.
2998 Alternatively, a system that has two hard-drives does not need a VIO
2999 domain but can directly assign one hardware storage device to each of
3000 the workloads (if the platform offers an IO-MMU, cf
3001 Section~\ref{s:ddsecurity}. Sharing hardware through virtualization
3002 is a trade-off between the amount of trusted code (size of the trusted
3003 computing base) and the amount of acceptable over-provisioning. This
3004 holds both for peripherals and for system platforms.
3006 \subsection{Tools For Creating sHype/Xen Security Policies}
3007 To create a security policy for Xen, you can use one of the following
3008 tools:
3009 \begin{itemize}
3010 \item \verb|ezPolicy| GUI tool -- start writing policies
3011 \item \verb|xensec_gen| tool -- refine policies created with \verb|ezPolicy|
3012 \item text or XML editor
3013 \end{itemize}
3015 We use the \verb|ezPolicy| tool in
3016 Section~\ref{subsection:acmexamplecreate} to quickly create a workload
3017 protection policy. If desired, the resulting XML policy file can be
3018 loaded into the \verb|xensec_gen| tool to refine it. It can also be
3019 directly edited using an XML editor. Any XML policy file is verified
3020 against the security policy schema when it is translated (see
3021 Subsection~\ref{subsection:acmexampleinstall}).
3023 \section{Current Limitations}
3024 \label{section:acmlimitations}
3026 The sHype/ACM configuration for Xen is work in progress. There is
3027 ongoing work for protecting virtualized resources and planned and
3028 ongoing work for protecting access to remote resources and domains.
3029 The following sections describe limitations of some of the areas into
3030 which access control is being extended.
3032 \subsection{Network Traffic}
3033 Local and remote network traffic is currently not controlled.
3034 Solutions to add sHype/ACM policy enforcement to the virtual network
3035 exist but need to be discussed before they can become part of Xen.
3036 Subjecting external network traffic to the ACM security policy is work
3037 in progress. Manually setting up filters in domain 0 is required for
3038 now but does not scale well.
3040 \subsection{Resource Access and Usage Control}
3042 Enforcing the security policy across multiple hypervisor systems and
3043 on access to remote shared resources is work in progress. Extending
3044 access control to new types of resources is ongoing work (e.g. network
3045 storage).
3047 On a single Xen system, information about the association of resources
3048 and security labels is stored in
3049 \verb|/etc/xen/acm-security/policy/resource_labels|. This file relates
3050 a full resource path with a security label. This association is weak
3051 and will break if resources are moved or renamed without adapting the
3052 label file. Improving the protection of label-resource relationships
3053 is ongoing work.
3055 Controlling resource usage and enforcing resource limits in general is
3056 ongoing work in the Xen community.
3058 \subsection{Domain Migration}
3060 Labels on domains are enforced during domain migration and the
3061 destination hypervisor will ensure that the domain label is valid and
3062 the domain is permitted to run (considering the Chinese Wall policy
3063 rules) before it accepts the migration. However, the network between
3064 the source and destination hypervisor as well as both hypervisors must
3065 be trusted. Architectures and prototypes exist that both protect the
3066 network connection and ensure that the hypervisors enforce access
3067 control consistently but patches are not yet available for the main
3068 stream.
3070 \subsection{Covert Channels}
3072 The sHype access control aims at system independent security policies.
3073 It builds on top of the core hypervisor isolation. Any covert channels
3074 that exist in the core hypervisor or in the hardware (e.g., shared
3075 processor cache) will be inherited. If those covert channels are not
3076 the result of trade-offs between security and other system properties,
3077 then they are most effectively minimized or eliminated where they are
3078 caused. sHype offers however some means to mitigate their impact
3079 (cf. run-time exclusion rules).
3081 \part{Reference}
3083 %% Chapter Build and Boot Options
3084 \chapter{Build and Boot Options}
3086 This chapter describes the build- and boot-time options which may be
3087 used to tailor your Xen system.
3089 \section{Top-level Configuration Options}
3091 Top-level configuration is achieved by editing one of two
3092 files: \path{Config.mk} and \path{Makefile}.
3094 The former allows the overall build target architecture to be
3095 specified. You will typically not need to modify this unless
3096 you are cross-compiling or if you wish to build a PAE-enabled
3097 Xen system. Additional configuration options are documented
3098 in the \path{Config.mk} file.
3100 The top-level \path{Makefile} is chiefly used to customize the set of
3101 kernels built. Look for the line:
3102 \begin{quote}
3103 \begin{verbatim}
3104 KERNELS ?= linux-2.6-xen0 linux-2.6-xenU
3105 \end{verbatim}
3106 \end{quote}
3108 Allowable options here are any kernels which have a corresponding
3109 build configuration file in the \path{buildconfigs/} directory.
3113 \section{Xen Build Options}
3115 Xen provides a number of build-time options which should be set as
3116 environment variables or passed on make's command-line.
3118 \begin{description}
3119 \item[verbose=y] Enable debugging messages when Xen detects an
3120 unexpected condition. Also enables console output from all domains.
3121 \item[debug=y] Enable debug assertions. Implies {\bf verbose=y}.
3122 (Primarily useful for tracing bugs in Xen).
3123 \item[debugger=y] Enable the in-Xen debugger. This can be used to
3124 debug Xen, guest OSes, and applications.
3125 \item[perfc=y] Enable performance counters for significant events
3126 within Xen. The counts can be reset or displayed on Xen's console
3127 via console control keys.
3128 \end{description}
3131 \section{Xen Boot Options}
3132 \label{s:xboot}
3134 These options are used to configure Xen's behaviour at runtime. They
3135 should be appended to Xen's command line, either manually or by
3136 editing \path{grub.conf}.
3138 \begin{description}
3139 \item [ noreboot ] Don't reboot the machine automatically on errors.
3140 This is useful to catch debug output if you aren't catching console
3141 messages via the serial line.
3142 \item [ nosmp ] Disable SMP support. This option is implied by
3143 `ignorebiostables'.
3144 \item [ watchdog ] Enable NMI watchdog which can report certain
3145 failures.
3146 \item [ noirqbalance ] Disable software IRQ balancing and affinity.
3147 This can be used on systems such as Dell 1850/2850 that have
3148 workarounds in hardware for IRQ-routing issues.
3149 \item [ badpage=$<$page number$>$,$<$page number$>$, \ldots ] Specify
3150 a list of pages not to be allocated for use because they contain bad
3151 bytes. For example, if your memory tester says that byte 0x12345678
3152 is bad, you would place `badpage=0x12345' on Xen's command line.
3153 \item [ com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$
3154 com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\
3155 Xen supports up to two 16550-compatible serial ports. For example:
3156 `com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data
3157 bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5. If some
3158 configuration options are standard (e.g., I/O base and IRQ), then
3159 only a prefix of the full configuration string need be specified. If
3160 the baud rate is pre-configured (e.g., by the bootloader) then you
3161 can specify `auto' in place of a numeric baud rate.
3162 \item [ console=$<$specifier list$>$ ] Specify the destination for Xen
3163 console I/O. This is a comma-separated list of, for example:
3164 \begin{description}
3165 \item[ vga ] Use VGA console (until domain 0 boots, unless {\bf
3166 vga=...keep } is specified).
3167 \item[ com1 ] Use serial port com1.
3168 \item[ com2H ] Use serial port com2. Transmitted chars will have the
3169 MSB set. Received chars must have MSB set.
3170 \item[ com2L] Use serial port com2. Transmitted chars will have the
3171 MSB cleared. Received chars must have MSB cleared.
3172 \end{description}
3173 The latter two examples allow a single port to be shared by two
3174 subsystems (e.g.\ console and debugger). Sharing is controlled by
3175 MSB of each transmitted/received character. [NB. Default for this
3176 option is `com1,vga']
3177 \item [ vga=$<$mode$>$(,keep) ] The mode is one of the following options:
3178 \begin{description}
3179 \item[ ask ] Display a vga menu allowing manual selection of video
3180 mode.
3181 \item[ current ] Use existing vga mode without modification.
3182 \item[ text-$<$mode$>$ ] Select text-mode resolution, where mode is
3183 one of 80x25, 80x28, 80x30, 80x34, 80x43, 80x50, 80x60.
3184 \item[ gfx-$<$mode$>$ ] Select VESA graphics mode
3185 $<$width$>$x$<$height$>$x$<$depth$>$ (e.g., `vga=gfx-1024x768x32').
3186 \item[ mode-$<$mode$>$ ] Specify a mode number as discovered by `vga
3187 ask'. Note that the numbers are displayed in hex and hence must be
3188 prefixed by `0x' here (e.g., `vga=mode-0x0335').
3189 \end{description}
3190 The mode may optionally be followed by `{\bf,keep}' to cause Xen to keep
3191 writing to the VGA console after domain 0 starts booting (e.g., `vga=text-80x50,keep').
3192 \item [ no-real-mode ] (x86 only) Do not execute real-mode bootstrap
3193 code when booting Xen. This option should not be used except for
3194 debugging. It will effectively disable the {\bf vga} option, which
3195 relies on real mode to set the video mode.
3196 \item [ edid=no,force ] (x86 only) Either force retrieval of monitor
3197 EDID information via VESA DDC, or disable it (edid=no). This option
3198 should not normally be required except for debugging purposes.
3199 \item [ edd=off,on,skipmbr ] (x86 only) Control retrieval of Extended
3200 Disc Data (EDD) from the BIOS during boot.
3201 \item [ console\_to\_ring ] Place guest console output into the
3202 hypervisor console ring buffer. This is disabled by default.
3203 When enabled, both hypervisor output and guest console output
3204 is available from the ring buffer. This can be useful for logging
3205 and/or remote presentation of console data.
3206 \item [ sync\_console ] Force synchronous console output. This is
3207 useful if you system fails unexpectedly before it has sent all
3208 available output to the console. In most cases Xen will
3209 automatically enter synchronous mode when an exceptional event
3210 occurs, but this option provides a manual fallback.
3211 \item [ conswitch=$<$switch-char$><$auto-switch-char$>$ ] Specify how
3212 to switch serial-console input between Xen and DOM0. The required
3213 sequence is CTRL-$<$switch-char$>$ pressed three times. Specifying
3214 the backtick character disables switching. The
3215 $<$auto-switch-char$>$ specifies whether Xen should auto-switch
3216 input to DOM0 when it boots --- if it is `x' then auto-switching is
3217 disabled. Any other value, or omitting the character, enables
3218 auto-switching. [NB. Default switch-char is `a'.]
3219 \item [ loglvl=$<$level$>/<$level$>$ ]
3220 Specify logging level. Messages of the specified severity level (and
3221 higher) will be printed to the Xen console. Valid levels are `none',
3222 `error', `warning', `info', `debug', and `all'. The second level
3223 specifier is optional: it is used to specify message severities
3224 which are to be rate limited. Default is `loglvl=warning'.
3225 \item [ guest\_loglvl=$<$level$>/<$level$>$ ] As for loglvl, but
3226 applies to messages relating to guests. Default is
3227 `guest\_loglvl=none/warning'.
3228 \item [ nmi=xxx ]
3229 Specify what to do with an NMI parity or I/O error. \\
3230 `nmi=fatal': Xen prints a diagnostic and then hangs. \\
3231 `nmi=dom0': Inform DOM0 of the NMI. \\
3232 `nmi=ignore': Ignore the NMI.
3233 \item [ mem=xxx ] Set the physical RAM address limit. Any RAM
3234 appearing beyond this physical address in the memory map will be
3235 ignored. This parameter may be specified with a B, K, M or G suffix,
3236 representing bytes, kilobytes, megabytes and gigabytes respectively.
3237 The default unit, if no suffix is specified, is kilobytes.
3238 \item [ dom0\_mem=$<$specifier list$>$ ] Set the amount of memory to
3239 be allocated to domain 0. This is a comma-separated list containing
3240 the following optional components:
3241 \begin{description}
3242 \item[ min:$<$min\_amt$>$ ] Minimum amount to allocate to domain 0
3243 \item[ max:$<$min\_amt$>$ ] Maximum amount to allocate to domain 0
3244 \item[ $<$amt$>$ ] Precise amount to allocate to domain 0
3245 \end{description}
3246 Each numeric parameter may be specified with a B, K, M or
3247 G suffix, representing bytes, kilobytes, megabytes and gigabytes
3248 respectively; if no suffix is specified, the parameter defaults to
3249 kilobytes. Negative values are subtracted from total available
3250 memory. If $<$amt$>$ is not specified, it defaults to all available
3251 memory less a small amount (clamped to 128MB) for uses such as DMA
3252 buffers.
3253 \item [ dom0\_vcpus\_pin ] Pins domain 0 VCPUs on their respective
3254 physical CPUS (default=false).
3255 \item [ tbuf\_size=xxx ] Set the size of the per-cpu trace buffers, in
3256 pages (default 0).
3257 \item [ sched=xxx ] Select the CPU scheduler Xen should use. The
3258 current possibilities are `credit' (default), and `sedf'.
3259 \item [ apic\_verbosity=debug,verbose ] Print more detailed
3260 information about local APIC and IOAPIC configuration.
3261 \item [ lapic ] Force use of local APIC even when left disabled by
3262 uniprocessor BIOS.
3263 \item [ nolapic ] Ignore local APIC in a uniprocessor system, even if
3264 enabled by the BIOS.
3265 \item [ apic=bigsmp,default,es7000,summit ] Specify NUMA platform.
3266 This can usually be probed automatically.
3267 \item [ dma\_bits=xxx ] Specify width of DMA
3268 addresses in bits. Default is 30 bits (addresses up to 1GB are DMAable).
3269 \item [ dma\_emergency\_pool=xxx ] Specify lower bound on size of DMA
3270 pool below which ordinary allocations will fail rather than fall
3271 back to allocating from the DMA pool.
3272 \item [ hap ] Instruct Xen to detect hardware-assisted paging support, such
3273 as AMD-V's nested paging or Intel\textregistered VT's extended paging. If
3274 available, Xen will use hardware-assisted paging instead of shadow paging
3275 for guest memory management.
3276 \end{description}
3278 In addition, the following options may be specified on the Xen command
3279 line. Since domain 0 shares responsibility for booting the platform,
3280 Xen will automatically propagate these options to its command line.
3281 These options are taken from Linux's command-line syntax with
3282 unchanged semantics.
3284 \begin{description}
3285 \item [ acpi=off,force,strict,ht,noirq,\ldots ] Modify how Xen (and
3286 domain 0) parses the BIOS ACPI tables.
3287 \item [ acpi\_skip\_timer\_override ] Instruct Xen (and domain~0) to
3288 ignore timer-interrupt override instructions specified by the BIOS
3289 ACPI tables.
3290 \item [ noapic ] Instruct Xen (and domain~0) to ignore any IOAPICs
3291 that are present in the system, and instead continue to use the
3292 legacy PIC.
3293 \end{description}
3296 \section{XenLinux Boot Options}
3298 In addition to the standard Linux kernel boot options, we support:
3299 \begin{description}
3300 \item[ xencons=xxx ] Specify the device node to which the Xen virtual
3301 console driver is attached. The following options are supported:
3302 \begin{center}
3303 \begin{tabular}{l}
3304 `xencons=off': disable virtual console \\
3305 `xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\
3306 `xencons=ttyS': attach console to /dev/ttyS0
3307 \end{tabular}
3308 \end{center}
3309 The default is ttyS for dom0 and tty for all other domains.
3310 \end{description}
3313 %% Chapter Further Support
3314 \chapter{Further Support}
3316 If you have questions that are not answered by this manual, the
3317 sources of information listed below may be of interest to you. Note
3318 that bug reports, suggestions and contributions related to the
3319 software (or the documentation) should be sent to the Xen developers'
3320 mailing list (address below).
3323 \section{Other Documentation}
3325 For developers interested in porting operating systems to Xen, the
3326 \emph{Xen Interface Manual} is distributed in the \path{docs/}
3327 directory of the Xen source distribution.
3330 \section{Online References}
3332 The official Xen web site can be found at:
3333 \begin{quote} {\tt http://www.xensource.com}
3334 \end{quote}
3336 This contains links to the latest versions of all online
3337 documentation, including the latest version of the FAQ.
3339 Information regarding Xen is also available at the Xen Wiki at
3340 \begin{quote} {\tt http://wiki.xensource.com/xenwiki/}\end{quote}
3341 The Xen project uses Bugzilla as its bug tracking system. You'll find
3342 the Xen Bugzilla at http://bugzilla.xensource.com/bugzilla/.
3345 \section{Mailing Lists}
3347 There are several mailing lists that are used to discuss Xen related
3348 topics. The most widely relevant are listed below. An official page of
3349 mailing lists and subscription information can be found at \begin{quote}
3350 {\tt http://lists.xensource.com/} \end{quote}
3352 \begin{description}
3353 \item[xen-devel@lists.xensource.com] Used for development
3354 discussions and bug reports. Subscribe at: \\
3355 {\small {\tt http://lists.xensource.com/xen-devel}}
3356 \item[xen-users@lists.xensource.com] Used for installation and usage
3357 discussions and requests for help. Subscribe at: \\
3358 {\small {\tt http://lists.xensource.com/xen-users}}
3359 \item[xen-announce@lists.xensource.com] Used for announcements only.
3360 Subscribe at: \\
3361 {\small {\tt http://lists.xensource.com/xen-announce}}
3362 \item[xen-changelog@lists.xensource.com] Changelog feed
3363 from the unstable and 2.0 trees - developer oriented. Subscribe at: \\
3364 {\small {\tt http://lists.xensource.com/xen-changelog}}
3365 \end{description}
3369 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
3371 \appendix
3373 \chapter{Unmodified (VMX) guest domains in Xen with Intel\textregistered Virtualization Technology (VT)}
3375 Xen supports guest domains running unmodified Guest operating systems using Virtualization Technology (VT) available on recent Intel Processors. More information about the Intel Virtualization Technology implementing Virtual Machine Extensions (VMX) in the processor is available on the Intel website at \\
3376 {\small {\tt http://www.intel.com/technology/computing/vptech}}
3378 \section{Building Xen with VT support}
3380 The following packages need to be installed in order to build Xen with VT support. Some Linux distributions do not provide these packages by default.
3382 \begin{tabular}{lp{11.0cm}}
3383 {\bfseries Package} & {\bfseries Description} \\
3385 dev86 & The dev86 package provides an assembler and linker for real mode 80x86 instructions. You need to have this package installed in order to build the BIOS code which runs in (virtual) real mode.
3387 If the dev86 package is not available on the x86\_64 distribution, you can install the i386 version of it. The dev86 rpm package for various distributions can be found at {\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=dev86\&submit=Search}} \\
3389 LibVNCServer & The unmodified guest's VGA display, keyboard, and mouse can be virtualized by the vncserver library. You can get the sources of libvncserver from {\small {\tt http://sourceforge.net/projects/libvncserver}}. Build and install the sources on the build system to get the libvncserver library. There is a significant performance degradation in 0.8 version. The current sources in the CVS tree have fixed this degradation. So it is highly recommended to download the latest CVS sources and install them.\\
3391 SDL-devel, SDL & Simple DirectMedia Layer (SDL) is another way of virtualizing the unmodified guest console. It provides an X window for the guest console.
3393 If the SDL and SDL-devel packages are not installed by default on the build system, they can be obtained from {\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=SDL\&amp;submit=Search}}
3394 , {\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=SDL-devel\&submit=Search}} \\
3396 \end{tabular}
3398 \section{Configuration file for unmodified VMX guests}
3400 The Xen installation includes a sample configuration file, {\small {\tt /etc/xen/xmexample.vmx}}. There are comments describing all the options. In addition to the common options that are the same as those for paravirtualized guest configurations, VMX guest configurations have the following settings:
3402 \begin{tabular}{lp{11.0cm}}
3404 {\bfseries Parameter} & {\bfseries Description} \\
3406 kernel & The VMX firmware loader, {\small {\tt /usr/lib/xen/boot/vmxloader}}\\
3408 builder & The domain build function. The VMX domain uses the vmx builder.\\
3410 acpi & Enable VMX guest ACPI, default=0 (disabled)\\
3412 apic & Enable VMX guest APIC, default=0 (disabled)\\
3414 pae & Enable VMX guest PAE, default=0 (disabled)\\
3416 vif & Optionally defines MAC address and/or bridge for the network interfaces. Random MACs are assigned if not given. {\small {\tt type=ioemu}} means ioemu is used to virtualize the VMX NIC. If no type is specified, vbd is used, as with paravirtualized guests.\\
3418 disk & Defines the disk devices you want the domain to have access to, and what you want them accessible as. If using a physical device as the VMX guest's disk, each disk entry is of the form
3420 {\small {\tt phy:UNAME,ioemu:DEV,MODE,}}
3422 where UNAME is the device, DEV is the device name the domain will see, and MODE is r for read-only, w for read-write. ioemu means the disk will use ioemu to virtualize the VMX disk. If not adding ioemu, it uses vbd like paravirtualized guests.
3424 If using disk image file, its form should be like
3426 {\small {\tt file:FILEPATH,ioemu:DEV,MODE}}
3428 If using more than one disk, there should be a comma between each disk entry. For example:
3430 {\scriptsize {\tt disk = ['file:/var/images/image1.img,ioemu:hda,w', 'file:/var/images/image2.img,ioemu:hdb,w']}}\\
3432 cdrom & Disk image for CD-ROM. The default is {\small {\tt /dev/cdrom}} for Domain0. Inside the VMX domain, the CD-ROM will available as device {\small {\tt /dev/hdc}}. The entry can also point to an ISO file.\\
3434 boot & Boot from floppy (a), hard disk (c) or CD-ROM (d). For example, to boot from CD-ROM, the entry should be:
3436 boot='d'\\
3438 device\_model & The device emulation tool for VMX guests. This parameter should not be changed.\\
3440 sdl & Enable SDL library for graphics, default = 0 (disabled)\\
3442 vnc & Enable VNC library for graphics, default = 1 (enabled)\\
3444 vncviewer & Enable spawning of the vncviewer (only valid when vnc=1), default = 1 (enabled)
3446 If vnc=1 and vncviewer=0, user can use vncviewer to manually connect VMX from remote. For example:
3448 {\small {\tt vncviewer domain0\_IP\_address:VMX\_domain\_id}} \\
3450 ne2000 & Enable ne2000, default = 0 (disabled; use pcnet)\\
3452 serial & Enable redirection of VMX serial output to pty device\\
3454 \end{tabular}
3456 \begin{tabular}{lp{10cm}}
3458 usb & Enable USB support without defining a specific USB device.
3459 This option defaults to 0 (disabled) unless the option usbdevice is
3460 specified in which case this option then defaults to 1 (enabled).\\
3462 usbdevice & Enable USB support and also enable support for the given
3463 device. Devices that can be specified are {\small {\tt mouse}} (a PS/2 style
3464 mouse), {\small {\tt tablet}} (an absolute pointing device) and
3465 {\small {\tt host:id1:id2}} (a physical USB device on the host machine whose
3466 ids are {\small {\tt id1}} and {\small {\tt id2}}). The advantage
3467 of {\small {\tt tablet}} is that Windows guests will automatically recognize
3468 and support this device so specifying the config line
3470 {\small
3471 \begin{verbatim}
3472 usbdevice='tablet'
3473 \end{verbatim}
3476 will create a mouse that works transparently with Windows guests under VNC.
3477 Linux doesn't recognize the USB tablet yet so Linux guests under VNC will
3478 still need the Summagraphics emulation.
3479 Details about mouse emulation are provided in section \textbf{A.4.3}.\\
3481 localtime & Set the real time clock to local time [default=0, that is, set to UTC].\\
3483 enable-audio & Enable audio support. This is under development.\\
3485 full-screen & Start in full screen. This is under development.\\
3487 nographic & Another way to redirect serial output. If enabled, no 'sdl' or 'vnc' can work. Not recommended.\\
3489 \end{tabular}
3492 \section{Creating virtual disks from scratch}
3493 \subsection{Using physical disks}
3494 If you are using a physical disk or physical disk partition, you need to install a Linux OS on the disk first. Then the boot loader should be installed in the correct place. For example {\small {\tt dev/sda}} for booting from the whole disk, or {\small {\tt /dev/sda1}} for booting from partition 1.
3496 \subsection{Using disk image files}
3497 You need to create a large empty disk image file first; then, you need to install a Linux OS onto it. There are two methods you can choose. One is directly installing it using a VMX guest while booting from the OS installation CD-ROM. The other is copying an installed OS into it. The boot loader will also need to be installed.
3499 \subsubsection*{To create the image file:}
3500 The image size should be big enough to accommodate the entire OS. This example assumes the size is 1G (which is probably too small for most OSes).
3502 {\small {\tt \# dd if=/dev/zero of=hd.img bs=1M count=1 seek=1023}}
3504 \subsubsection*{To directly install Linux OS into an image file using a VMX guest:}
3506 Install Xen and create VMX with the original image file with booting from CD-ROM. Then it is just like a normal Linux OS installation. The VMX configuration file should have these two entries before creating:
3508 {\small {\tt cdrom='/dev/cdrom'
3509 boot='d'}}
3511 If this method does not succeed, you can choose the following method of copying an installed Linux OS into an image file.
3513 \subsubsection*{To copy a installed OS into an image file:}
3514 Directly installing is an easier way to make partitions and install an OS in a disk image file. But if you want to create a specific OS in your disk image, then you will most likely want to use this method.
3516 \begin{enumerate}
3517 \item {\bfseries Install a normal Linux OS on the host machine}\\
3518 You can choose any way to install Linux, such as using yum to install Red Hat Linux or YAST to install Novell SuSE Linux. The rest of this example assumes the Linux OS is installed in {\small {\tt /var/guestos/}}.
3520 \item {\bfseries Make the partition table}\\
3521 The image file will be treated as hard disk, so you should make the partition table in the image file. For example:
3523 {\scriptsize {\tt \# losetup /dev/loop0 hd.img\\
3524 \# fdisk -b 512 -C 4096 -H 16 -S 32 /dev/loop0\\
3525 press 'n' to add new partition\\
3526 press 'p' to choose primary partition\\
3527 press '1' to set partition number\\
3528 press "Enter" keys to choose default value of "First Cylinder" parameter.\\
3529 press "Enter" keys to choose default value of "Last Cylinder" parameter.\\
3530 press 'w' to write partition table and exit\\
3531 \# losetup -d /dev/loop0}}
3533 \item {\bfseries Make the file system and install grub}\\
3534 {\scriptsize {\tt \# ln -s /dev/loop0 /dev/loop\\
3535 \# losetup /dev/loop0 hd.img\\
3536 \# losetup -o 16384 /dev/loop1 hd.img\\
3537 \# mkfs.ext3 /dev/loop1\\
3538 \# mount /dev/loop1 /mnt\\
3539 \# mkdir -p /mnt/boot/grub\\
3540 \# cp /boot/grub/stage* /boot/grub/e2fs\_stage1\_5 /mnt/boot/grub\\
3541 \# umount /mnt\\
3542 \# grub\\
3543 grub> device (hd0) /dev/loop\\
3544 grub> root (hd0,0)\\
3545 grub> setup (hd0)\\
3546 grub> quit\\
3547 \# rm /dev/loop\\
3548 \# losetup -d /dev/loop0\\
3549 \# losetup -d /dev/loop1}}
3551 The {\small {\tt losetup}} option {\small {\tt -o 16384}} skips the partition table in the image file. It is the number of sectors times 512. We need {\small {\tt /dev/loop}} because grub is expecting a disk device \emph{name}, where \emph{name} represents the entire disk and \emph{name1} represents the first partition.
3553 \item {\bfseries Copy the OS files to the image}\\
3554 If you have Xen installed, you can easily use {\small {\tt lomount}} instead of {\small {\tt losetup}} and {\small {\tt mount}} when coping files to some partitions. {\small {\tt lomount}} just needs the partition information.
3556 {\scriptsize {\tt \# lomount -t ext3 -diskimage hd.img -partition 1 /mnt/guest\\
3557 \# cp -ax /var/guestos/\{root,dev,var,etc,usr,bin,sbin,lib\} /mnt/guest\\
3558 \# mkdir /mnt/guest/\{proc,sys,home,tmp\}}}
3560 \item {\bfseries Edit the {\small {\tt /etc/fstab}} of the guest image}\\
3561 The fstab should look like this:
3563 {\scriptsize {\tt \# vim /mnt/guest/etc/fstab\\
3564 /dev/hda1 / ext3 defaults 1 1\\
3565 none /dev/pts devpts gid=5,mode=620 0 0\\
3566 none /dev/shm tmpfs defaults 0 0\\
3567 none /proc proc defaults 0 0\\
3568 none /sys sysfs efaults 0 0}}
3570 \item {\bfseries umount the image file}\\
3571 {\small {\tt \# umount /mnt/guest}}
3572 \end{enumerate}
3574 Now, the guest OS image {\small {\tt hd.img}} is ready. You can also reference {\small {\tt http://free.oszoo.org}} for quickstart images. But make sure to install the boot loader.
3576 \subsection{Install Windows into an Image File using a VMX guest}
3577 In order to install a Windows OS, you should keep {\small {\tt acpi=0}} in your VMX configuration file.
3579 \section{VMX Guests}
3580 \subsection{Editing the Xen VMX config file}
3581 Make a copy of the example VMX configuration file {\small {\tt /etc/xen/xmeaxmple.vmx}} and edit the line that reads
3583 {\small {\tt disk = [ 'file:/var/images/\emph{guest.img},ioemu:hda,w' ]}}
3585 replacing \emph{guest.img} with the name of the guest OS image file you just made.
3587 \subsection{Creating VMX guests}
3588 Simply follow the usual method of creating the guest, using the -f parameter and providing the filename of your VMX configuration file:\\
3590 {\small {\tt \# xend start\\
3591 \# xm create /etc/xen/vmxguest.vmx}}
3593 In the default configuration, VNC is on and SDL is off. Therefore VNC windows will open when VMX guests are created. If you want to use SDL to create VMX guests, set {\small {\tt sdl=1}} in your VMX configuration file. You can also turn off VNC by setting {\small {\tt vnc=0}}.
3595 \subsection{Mouse issues, especially under VNC}
3596 Mouse handling when using VNC is a little problematic.
3597 The problem is that the VNC viewer provides a virtual pointer which is
3598 located at an absolute location in the VNC window and only absolute
3599 coordinates are provided.
3600 The VMX device model converts these absolute mouse coordinates
3601 into the relative motion deltas that are expected by the PS/2
3602 mouse driver running in the guest.
3603 Unfortunately,
3604 it is impossible to keep these generated mouse deltas
3605 accurate enough for the guest cursor to exactly match
3606 the VNC pointer.
3607 This can lead to situations where the guest's cursor
3608 is in the center of the screen and there's no way to
3609 move that cursor to the left
3610 (it can happen that the VNC pointer is at the left
3611 edge of the screen and,
3612 therefore,
3613 there are no longer any left mouse deltas that
3614 can be provided by the device model emulation code.)
3616 To deal with these mouse issues there are 4 different
3617 mouse emulations available from the VMX device model:
3619 \begin{description}
3620 \item[PS/2 mouse over the PS/2 port.]
3621 This is the default mouse
3622 that works perfectly well under SDL.
3623 Under VNC the guest cursor will get
3624 out of sync with the VNC pointer.
3625 When this happens you can re-synchronize
3626 the guest cursor to the VNC pointer by
3627 holding down the
3628 \textbf{left-ctl}
3629 and
3630 \textbf{left-alt}
3631 keys together.
3632 While these keys are down VNC pointer motions
3633 will not be reported to the guest so
3634 that the VNC pointer can be moved
3635 to a place where it is possible
3636 to move the guest cursor again.
3638 \item[Summagraphics mouse over the serial port.]
3639 The device model also provides emulation
3640 for a Summagraphics tablet,
3641 an absolute pointer device.
3642 This emulation is provided over the second
3643 serial port,
3644 \textbf{/dev/ttyS1}
3645 for Linux guests and
3646 \textbf{COM2}
3647 for Windows guests.
3648 Unfortunately,
3649 neither Linux nor Windows provides
3650 default support for the Summagraphics
3651 tablet so the guest will have to be
3652 manually configured for this mouse.
3654 \textbf{Linux configuration.}
3656 First,
3657 configure the GPM service to use the Summagraphics tablet.
3658 This can vary between distributions but,
3659 typically,
3660 all that needs to be done is modify the file
3661 \path{/etc/sysconfig/mouse} to contain the lines:
3663 {\small
3664 \begin{verbatim}
3665 MOUSETYPE="summa"
3667 DEVICE=/dev/ttyS1
3668 \end{verbatim}
3671 and then restart the GPM daemon.
3673 Next,
3674 modify the X11 config
3675 \path{/etc/X11/xorg.conf}
3676 to support the Summgraphics tablet by replacing
3677 the input device stanza with the following:
3679 {\small
3680 \begin{verbatim}
3681 Section "InputDevice"
3682 Identifier "Mouse0"
3683 Driver "summa"
3684 Option "Device" "/dev/ttyS1"
3685 Option "InputFashion" "Tablet"
3686 Option "Mode" "Absolute"
3687 Option "Name" "EasyPen"
3688 Option "Compatible" "True"
3689 Option "Protocol" "Auto"
3690 Option "SendCoreEvents" "on"
3691 Option "Vendor" "GENIUS"
3692 EndSection
3693 \end{verbatim}
3696 Restart X and the X cursor should now properly
3697 track the VNC pointer.
3700 \textbf{Windows configuration.}
3702 Get the file
3703 \path{http://www.cad-plan.de/files/download/tw2k.exe}
3704 and execute that file on the guest,
3705 answering the questions as follows:
3707 \begin{enumerate}
3708 \item When the program asks for \textbf{model},
3709 scroll down and selese \textbf{SummaSketch (MM Compatible)}.
3711 \item When the program asks for \textbf{COM Port} specify \textbf{com2}.
3713 \item When the programs asks for a \textbf{Cursor Type} specify
3714 \textbf{4 button cursor/puck}.
3716 \item The guest system will then reboot and,
3717 when it comes back up,
3718 the guest cursor will now properly track
3719 the VNC pointer.
3720 \end{enumerate}
3722 \item[PS/2 mouse over USB port.]
3723 This is just the same PS/2 emulation except it is
3724 provided over a USB port.
3725 This emulation is enabled by the configuration flag:
3726 {\small
3727 \begin{verbatim}
3728 usbdevice='mouse'
3729 \end{verbatim}
3732 \item[USB tablet over USB port.]
3733 The USB tablet is an absolute pointing device
3734 that has the advantage that it is automatically
3735 supported under Windows guests,
3736 although Linux guests still require some
3737 manual configuration.
3738 This mouse emulation is enabled by the
3739 configuration flag:
3740 {\small
3741 \begin{verbatim}
3742 usbdevice='tablet'
3743 \end{verbatim}
3746 \textbf{Linux configuration.}
3748 Unfortunately,
3749 there is no GPM support for the
3750 USB tablet at this point in time.
3751 If you intend to use a GPM pointing
3752 device under VNC you should
3753 configure the guest for Summagraphics
3754 emulation.
3756 Support for X11 is available by following
3757 the instructions at\\
3758 \verb+http://stz-softwaretechnik.com/~ke/touchscreen/evtouch.html+\\
3759 with one minor change.
3760 The
3761 \path{xorg.conf}
3762 given in those instructions
3763 uses the wrong values for the X \& Y minimums and maximums,
3764 use the following config stanza instead:
3766 {\small
3767 \begin{verbatim}
3768 Section "InputDevice"
3769 Identifier "Tablet"
3770 Driver "evtouch"
3771 Option "Device" "/dev/input/event2"
3772 Option "DeviceName" "touchscreen"
3773 Option "MinX" "0"
3774 Option "MinY" "0"
3775 Option "MaxX" "32256"
3776 Option "MaxY" "32256"
3777 Option "ReportingMode" "Raw"
3778 Option "Emulate3Buttons"
3779 Option "Emulate3Timeout" "50"
3780 Option "SendCoreEvents" "On"
3781 EndSection
3782 \end{verbatim}
3785 \textbf{Windows configuration.}
3787 Just enabling the USB tablet in the
3788 guest's configuration file is sufficient,
3789 Windows will automatically recognize and
3790 configure device drivers for this
3791 pointing device.
3793 \end{description}
3795 \subsection{USB Support}
3796 There is support for an emulated USB mouse,
3797 an emulated USB tablet
3798 and physical low speed USB devices
3799 (support for high speed USB 2.0 devices is
3800 still under development).
3802 \begin{description}
3803 \item[USB PS/2 style mouse.]
3804 Details on the USB mouse emulation are
3805 given in sections
3806 \textbf{A.2}
3807 and
3808 \textbf{A.4.3}.
3809 Enabling USB PS/2 style mouse emulation
3810 is just a matter of adding the line
3812 {\small
3813 \begin{verbatim}
3814 usbdevice='mouse'
3815 \end{verbatim}
3818 to the configuration file.
3819 \item[USB tablet.]
3820 Details on the USB tablet emulation are
3821 given in sections
3822 \textbf{A.2}
3823 and
3824 \textbf{A.4.3}.
3825 Enabling USB tablet emulation
3826 is just a matter of adding the line
3828 {\small
3829 \begin{verbatim}
3830 usbdevice='tablet'
3831 \end{verbatim}
3834 to the configuration file.
3835 \item[USB physical devices.]
3836 Access to a physical (low speed) USB device
3837 is enabled by adding a line of the form
3839 {\small
3840 \begin{verbatim}
3841 usbdevice='host:vid:pid'
3842 \end{verbatim}
3845 into the the configuration file.\footnote{
3846 There is an alternate
3847 way of specifying a USB device that
3848 uses the syntax
3849 \textbf{host:bus.addr}
3850 but this syntax suffers from
3851 a major problem that makes
3852 it effectively useless.
3853 The problem is that the
3854 \textbf{addr}
3855 portion of this address
3856 changes every time the USB device
3857 is plugged into the system.
3858 For this reason this addressing
3859 scheme is not recommended and
3860 will not be documented further.
3862 \textbf{vid}
3863 and
3864 \textbf{pid}
3865 are a
3866 product id and
3867 vendor id
3868 that uniquely identify
3869 the USB device.
3870 These ids can be identified
3871 in two ways:
3873 \begin{enumerate}
3874 \item Through the control window.
3875 As described in section
3876 \textbf{A.4.6}
3877 the control window
3878 is activated by pressing
3879 \textbf{ctl-alt-2}
3880 in the guest VGA window.
3881 As long as USB support is
3882 enabled in the guest by including
3883 the config file line
3884 {\small
3885 \begin{verbatim}
3886 usb=1
3887 \end{verbatim}
3889 then executing the command
3890 {\small
3891 \begin{verbatim}
3892 info usbhost
3893 \end{verbatim}
3895 in the control window
3896 will display a list of all
3897 usb devices and their ids.
3898 For example,
3899 this output:
3900 {\small
3901 \begin{verbatim}
3902 Device 1.3, speed 1.5 Mb/s
3903 Class 00: USB device 04b3:310b
3904 \end{verbatim}
3906 was created from a USB mouse with
3907 vendor id
3908 \textbf{04b3}
3909 and product id
3910 \textbf{310b}.
3911 This device could be made available
3912 to the VMX guest by including the
3913 config file entry
3914 {\small
3915 \begin{verbatim}
3916 usbdevice='host:04be:310b'
3917 \end{verbatim}
3920 It is also possible to
3921 enable access to a USB
3922 device dynamically through
3923 the control window.
3924 The control window command
3925 {\small
3926 \begin{verbatim}
3927 usb_add host:vid:pid
3928 \end{verbatim}
3930 will also allow access to a
3931 USB device with vendor id
3932 \textbf{vid}
3933 and product id
3934 \textbf{pid}.
3935 \item Through the
3936 \path{/proc} file system.
3937 The contents of the pseudo file
3938 \path{/proc/bus/usb/devices}
3939 can also be used to identify
3940 vendor and product ids.
3941 Looking at this file,
3942 the line starting with
3943 \textbf{P:}
3944 has a field
3945 \textbf{Vendor}
3946 giving the vendor id and
3947 another field
3948 \textbf{ProdID}
3949 giving the product id.
3950 The contents of
3951 \path{/proc/bus/usb/devices}
3952 for the example mouse is as
3953 follows:
3954 {\small
3955 \begin{verbatim}
3956 T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0
3957 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
3958 P: Vendor=04b3 ProdID=310b Rev= 1.60
3959 C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
3960 I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=(none)
3961 E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=10ms
3962 \end{verbatim}
3964 Note that the
3965 \textbf{P:}
3966 line correctly identifies the
3967 vendor id and product id
3968 for this mouse as
3969 \textbf{04b3:310b}.
3970 \end{enumerate}
3971 There is one other issue to
3972 be aware of when accessing a
3973 physical USB device from the guest.
3974 The Dom0 kernel must not have
3975 a device driver loaded for
3976 the device that the guest wishes
3977 to access.
3978 This means that the Dom0
3979 kernel must not have that
3980 device driver compiled into
3981 the kernel or,
3982 if using modules,
3983 that driver module must
3984 not be loaded.
3985 Note that this is the device
3986 specific USB driver that must
3987 not be loaded,
3988 either the
3989 \textbf{UHCI}
3990 or
3991 \textbf{OHCI}
3992 USB controller driver must
3993 still be loaded.
3995 Going back to the USB mouse
3996 as an example,
3997 if \textbf{lsmod}
3998 gives the output:
4000 {\small
4001 \begin{verbatim}
4002 Module Size Used by
4003 usbmouse 4128 0
4004 usbhid 28996 0
4005 uhci_hcd 35409 0
4006 \end{verbatim}
4009 then the USB mouse is being
4010 used by the Dom0 kernel and is
4011 not available to the guest.
4012 Executing the command
4013 \textbf{rmmod usbhid}\footnote{
4014 Turns out the
4015 \textbf{usbhid}
4016 driver is the significant
4017 one for the USB mouse,
4018 the presence or absence of
4019 the module
4020 \textbf{usbmouse}
4021 has no effect on whether or
4022 not the guest can see a USB mouse.}
4023 will remove the USB mouse
4024 driver from the Dom0 kernel
4025 and the mouse will now be
4026 accessible by the VMX guest.
4028 Be aware the the Linux USB
4029 hotplug system will reload
4030 the drivers if a USB device
4031 is removed and plugged back
4032 in.
4033 This means that just unloading
4034 the driver module might not
4035 be sufficient if the USB device
4036 is removed and added back.
4037 A more reliable technique is
4038 to first
4039 \textbf{rmmod}
4040 the driver and then rename the
4041 driver file in the
4042 \path{/lib/modules}
4043 directory,
4044 just to make sure it doesn't get
4045 reloaded.
4046 \end{description}
4048 \subsection{Destroy VMX guests}
4049 VMX guests can be destroyed in the same way as can paravirtualized guests. We recommend that you type the command
4051 {\small {\tt poweroff}}
4053 in the VMX guest's console first to prevent data loss. Then execute the command
4055 {\small {\tt xm destroy \emph{vmx\_guest\_id} }}
4057 at the Domain0 console.
4059 \subsection{VMX window (X or VNC) Hot Key}
4060 If you are running in the X environment after creating a VMX guest, an X window is created. There are several hot keys for control of the VMX guest that can be used in the window.
4062 {\bfseries Ctrl+Alt+2} switches from guest VGA window to the control window. Typing {\small {\tt help }} shows the control commands help. For example, 'q' is the command to destroy the VMX guest.\\
4063 {\bfseries Ctrl+Alt+1} switches back to VMX guest's VGA.\\
4064 {\bfseries Ctrl+Alt+3} switches to serial port output. It captures serial output from the VMX guest. It works only if the VMX guest was configured to use the serial port. \\
4066 \subsection{Save/Restore and Migration}
4067 VMX guests currently cannot be saved and restored, nor migrated. These features are currently under active development.
4069 \chapter{Vnets - Domain Virtual Networking}
4071 Xen optionally supports virtual networking for domains using {\em vnets}.
4072 These emulate private LANs that domains can use. Domains on the same
4073 vnet can be hosted on the same machine or on separate machines, and the
4074 vnets remain connected if domains are migrated. Ethernet traffic
4075 on a vnet is tunneled inside IP packets on the physical network. A vnet is a virtual
4076 network and addressing within it need have no relation to addressing on
4077 the underlying physical network. Separate vnets, or vnets and the physical network,
4078 can be connected using domains with more than one network interface and
4079 enabling IP forwarding or bridging in the usual way.
4081 Vnet support is included in \texttt{xm} and \xend:
4082 \begin{verbatim}
4083 # xm vnet-create <config>
4084 \end{verbatim}
4085 creates a vnet using the configuration in the file \verb|<config>|.
4086 When a vnet is created its configuration is stored by \xend and the vnet persists until it is
4087 deleted using
4088 \begin{verbatim}
4089 # xm vnet-delete <vnetid>
4090 \end{verbatim}
4091 The vnets \xend knows about are listed by
4092 \begin{verbatim}
4093 # xm vnet-list
4094 \end{verbatim}
4095 More vnet management commands are available using the
4096 \texttt{vn} tool included in the vnet distribution.
4098 The format of a vnet configuration file is
4099 \begin{verbatim}
4100 (vnet (id <vnetid>)
4101 (bridge <bridge>)
4102 (vnetif <vnet interface>)
4103 (security <level>))
4104 \end{verbatim}
4105 White space is not significant. The parameters are:
4106 \begin{itemize}
4107 \item \verb|<vnetid>|: vnet id, the 128-bit vnet identifier. This can be given
4108 as 8 4-digit hex numbers separated by colons, or in short form as a single 4-digit hex number.
4109 The short form is the same as the long form with the first 7 fields zero.
4110 Vnet ids must be non-zero and id 1 is reserved.
4112 \item \verb|<bridge>|: the name of a bridge interface to create for the vnet. Domains
4113 are connected to the vnet by connecting their virtual interfaces to the bridge.
4114 Bridge names are limited to 14 characters by the kernel.
4116 \item \verb|<vnetif>|: the name of the virtual interface onto the vnet (optional). The
4117 interface encapsulates and decapsulates vnet traffic for the network and is attached
4118 to the vnet bridge. Interface names are limited to 14 characters by the kernel.
4120 \item \verb|<level>|: security level for the vnet (optional). The level may be one of
4121 \begin{itemize}
4122 \item \verb|none|: no security (default). Vnet traffic is in clear on the network.
4123 \item \verb|auth|: authentication. Vnet traffic is authenticated using IPSEC
4124 ESP with hmac96.
4125 \item \verb|conf|: confidentiality. Vnet traffic is authenticated and encrypted
4126 using IPSEC ESP with hmac96 and AES-128.
4127 \end{itemize}
4128 Authentication and confidentiality are experimental and use hard-wired keys at present.
4129 \end{itemize}
4130 When a vnet is created its configuration is stored by \xend and the vnet persists until it is
4131 deleted using \texttt{xm vnet-delete <vnetid>}. The interfaces and bridges used by vnets
4132 are visible in the output of \texttt{ifconfig} and \texttt{brctl show}.
4134 \section{Example}
4135 If the file \path{vnet97.sxp} contains
4136 \begin{verbatim}
4137 (vnet (id 97) (bridge vnet97) (vnetif vnif97)
4138 (security none))
4139 \end{verbatim}
4140 Then \texttt{xm vnet-create vnet97.sxp} will define a vnet with id 97 and no security.
4141 The bridge for the vnet is called vnet97 and the virtual interface for it is vnif97.
4142 To add an interface on a domain to this vnet set its bridge to vnet97
4143 in its configuration. In Python:
4144 \begin{verbatim}
4145 vif="bridge=vnet97"
4146 \end{verbatim}
4147 In sxp:
4148 \begin{verbatim}
4149 (dev (vif (mac aa:00:00:01:02:03) (bridge vnet97)))
4150 \end{verbatim}
4151 Once the domain is started you should see its interface in the output of \texttt{brctl show}
4152 under the ports for \texttt{vnet97}.
4154 To get best performance it is a good idea to reduce the MTU of a domain's interface
4155 onto a vnet to 1400. For example using \texttt{ifconfig eth0 mtu 1400} or putting
4156 \texttt{MTU=1400} in \texttt{ifcfg-eth0}.
4157 You may also have to change or remove cached config files for eth0 under
4158 \texttt{/etc/sysconfig/networking}. Vnets work anyway, but performance can be reduced
4159 by IP fragmentation caused by the vnet encapsulation exceeding the hardware MTU.
4161 \section{Installing vnet support}
4162 Vnets are implemented using a kernel module, which needs to be loaded before
4163 they can be used. You can either do this manually before starting \xend, using the
4164 command \texttt{vn insmod}, or configure \xend to use the \path{network-vnet}
4165 script in the xend configuration file \texttt{/etc/xend/xend-config.sxp}:
4166 \begin{verbatim}
4167 (network-script network-vnet)
4168 \end{verbatim}
4169 This script insmods the module and calls the \path{network-bridge} script.
4171 The vnet code is not compiled and installed by default.
4172 To compile the code and install on the current system
4173 use \texttt{make install} in the root of the vnet source tree,
4174 \path{tools/vnet}. It is also possible to install to an installation
4175 directory using \texttt{make dist}. See the \path{Makefile} in
4176 the source for details.
4178 The vnet module creates vnet interfaces \texttt{vnif0002},
4179 \texttt{vnif0003} and \texttt{vnif0004} by default. You can test that
4180 vnets are working by configuring IP addresses on these interfaces
4181 and trying to ping them across the network. For example, using machines
4182 hostA and hostB:
4183 \begin{verbatim}
4184 hostA# ifconfig vnif0004 up
4185 hostB# ifconfig vnif0004 up
4186 hostB# ping
4187 \end{verbatim}
4189 The vnet implementation uses IP multicast to discover vnet interfaces, so
4190 all machines hosting vnets must be reachable by multicast. Network switches
4191 are often configured not to forward multicast packets, so this often
4192 means that all machines using a vnet must be on the same LAN segment,
4193 unless you configure vnet forwarding.
4195 You can test multicast coverage by pinging the vnet multicast address:
4196 \begin{verbatim}
4197 # ping -b
4198 \end{verbatim}
4199 You should see replies from all machines with the vnet module running.
4200 You can see if vnet packets are being sent or received by dumping traffic
4201 on the vnet UDP port:
4202 \begin{verbatim}
4203 # tcpdump udp port 1798
4204 \end{verbatim}
4206 If multicast is not being forwaded between machines you can configure
4207 multicast forwarding using vn. Suppose we have machines hostA on
4208 and hostB on and that multicast is not forwarded between them.
4209 We use vn to configure each machine to forward to the other:
4210 \begin{verbatim}
4211 hostA# vn peer-add hostB
4212 hostB# vn peer-add hostA
4213 \end{verbatim}
4214 Multicast forwarding needs to be used carefully - you must avoid creating forwarding
4215 loops. Typically only one machine on a subnet needs to be configured to forward,
4216 as it will forward multicasts received from other machines on the subnet.
4218 %% Chapter Glossary of Terms moved to glossary.tex
4219 \chapter{Glossary of Terms}
4221 \begin{description}
4223 \item[Domain] A domain is the execution context that contains a
4224 running {\bf virtual machine}. The relationship between virtual
4225 machines and domains on Xen is similar to that between programs and
4226 processes in an operating system: a virtual machine is a persistent
4227 entity that resides on disk (somewhat like a program). When it is
4228 loaded for execution, it runs in a domain. Each domain has a {\bf
4229 domain ID}.
4231 \item[Domain 0] The first domain to be started on a Xen machine.
4232 Domain 0 is responsible for managing the system.
4234 \item[Domain ID] A unique identifier for a {\bf domain}, analogous to
4235 a process ID in an operating system.
4237 \item[Full virtualization] An approach to virtualization which
4238 requires no modifications to the hosted operating system, providing
4239 the illusion of a complete system of real hardware devices.
4241 \item[Hypervisor] An alternative term for {\bf VMM}, used because it
4242 means `beyond supervisor', since it is responsible for managing
4243 multiple `supervisor' kernels.
4245 \item[Live migration] A technique for moving a running virtual machine
4246 to another physical host, without stopping it or the services
4247 running on it.
4249 \item[Paravirtualization] An approach to virtualization which requires
4250 modifications to the operating system in order to run in a virtual
4251 machine. Xen uses paravirtualization but preserves binary
4252 compatibility for user space applications.
4254 \item[Shadow pagetables] A technique for hiding the layout of machine
4255 memory from a virtual machine's operating system. Used in some {\bf
4256 VMMs} to provide the illusion of contiguous physical memory, in
4257 Xen this is used during {\bf live migration}.
4259 \item[Virtual Block Device] Persistant storage available to a virtual
4260 machine, providing the abstraction of an actual block storage device.
4261 {\bf VBD}s may be actual block devices, filesystem images, or
4262 remote/network storage.
4264 \item[Virtual Machine] The environment in which a hosted operating
4265 system runs, providing the abstraction of a dedicated machine. A
4266 virtual machine may be identical to the underlying hardware (as in
4267 {\bf full virtualization}, or it may differ, as in {\bf
4268 paravirtualization}).
4270 \item[VMM] Virtual Machine Monitor - the software that allows multiple
4271 virtual machines to be multiplexed on a single physical machine.
4273 \item[Xen] Xen is a paravirtualizing virtual machine monitor,
4274 developed primarily by the Systems Research Group at the University
4275 of Cambridge Computer Laboratory.
4277 \item[XenLinux] A name for the port of the Linux kernel that
4278 runs on Xen.
4280 \end{description}
4283 \end{document}
4286 %% Other stuff without a home
4288 %% Instructions Re Python API
4290 %% Other Control Tasks using Python
4291 %% ================================
4293 %% A Python module 'Xc' is installed as part of the tools-install
4294 %% process. This can be imported, and an 'xc object' instantiated, to
4295 %% provide access to privileged command operations:
4297 %% # import Xc
4298 %% # xc = Xc.new()
4299 %% # dir(xc)
4300 %% # help(xc.domain_create)
4302 %% In this way you can see that the class 'xc' contains useful
4303 %% documentation for you to consult.
4305 %% A further package of useful routines (xenctl) is also installed:
4307 %% # import xenctl.utils
4308 %% # help(xenctl.utils)
4310 %% You can use these modules to write your own custom scripts or you
4311 %% can customise the scripts supplied in the Xen distribution.
4315 % Explain about AGP GART
4318 %% If you're not intending to configure the new domain with an IP
4319 %% address on your LAN, then you'll probably want to use NAT. The
4320 %% 'xen_nat_enable' installs a few useful iptables rules into domain0
4321 %% to enable NAT. [NB: We plan to support RSIP in future]
4325 %% Installing the file systems from the CD
4326 %% =======================================
4328 %% If you haven't got an existing Linux installation onto which you
4329 %% can just drop down the Xen and Xenlinux images, then the file
4330 %% systems on the CD provide a quick way of doing an install. However,
4331 %% you would be better off in the long run doing a proper install of
4332 %% your preferred distro and installing Xen onto that, rather than
4333 %% just doing the hack described below:
4335 %% Choose one or two partitions, depending on whether you want a
4336 %% separate /usr or not. Make file systems on it/them e.g.:
4337 %% mkfs -t ext3 /dev/hda3
4338 %% [or mkfs -t ext2 /dev/hda3 && tune2fs -j /dev/hda3 if using an old
4339 %% version of mkfs]
4341 %% Next, mount the file system(s) e.g.:
4342 %% mkdir /mnt/root && mount /dev/hda3 /mnt/root
4343 %% [mkdir /mnt/usr && mount /dev/hda4 /mnt/usr]
4345 %% To install the root file system, simply untar /usr/XenDemoCD/root.tar.gz:
4346 %% cd /mnt/root && tar -zxpf /usr/XenDemoCD/root.tar.gz
4348 %% You'll need to edit /mnt/root/etc/fstab to reflect your file system
4349 %% configuration. Changing the password file (etc/shadow) is probably a
4350 %% good idea too.
4352 %% To install the usr file system, copy the file system from CD on
4353 %% /usr, though leaving out the "XenDemoCD" and "boot" directories:
4354 %% cd /usr && cp -a X11R6 etc java libexec root src bin dict kerberos
4355 %% local sbin tmp doc include lib man share /mnt/usr
4357 %% If you intend to boot off these file systems (i.e. use them for
4358 %% domain 0), then you probably want to copy the /usr/boot
4359 %% directory on the cd over the top of the current symlink to /boot
4360 %% on your root filesystem (after deleting the current symlink)
4361 %% i.e.:
4362 %% cd /mnt/root ; rm boot ; cp -a /usr/boot .