ia64/xen-unstable

changeset 6977:750ad97f37b0

Split up docs. Signed-off-by: Robb Romans <3r@us.ibm.com>
author kaf24@firebug.cl.cam.ac.uk
date Tue Sep 20 09:17:33 2005 +0000 (2005-09-20)
parents c0796e18b6a4
children f8e7af29daa1
files docs/Makefile docs/src/interface.tex docs/src/interface/architecture.tex docs/src/interface/debugging.tex docs/src/interface/devices.tex docs/src/interface/further_info.tex docs/src/interface/hypercalls.tex docs/src/interface/memory.tex docs/src/interface/scheduling.tex docs/src/user.tex docs/src/user/build.tex docs/src/user/control_software.tex docs/src/user/debian.tex docs/src/user/domain_configuration.tex docs/src/user/domain_filesystem.tex docs/src/user/domain_mgmt.tex docs/src/user/glossary.tex docs/src/user/installation.tex docs/src/user/introduction.tex docs/src/user/redhat.tex docs/src/user/start_addl_dom.tex
line diff
     1.1 --- a/docs/Makefile	Tue Sep 20 09:08:26 2005 +0000
     1.2 +++ b/docs/Makefile	Tue Sep 20 09:17:33 2005 +0000
     1.3 @@ -12,7 +12,7 @@ DOXYGEN		:= doxygen
     1.4  
     1.5  pkgdocdir	:= /usr/share/doc/xen
     1.6  
     1.7 -DOC_TEX		:= $(wildcard src/*.tex)
     1.8 +DOC_TEX		:= src/user.tex src/interface.tex
     1.9  DOC_PS		:= $(patsubst src/%.tex,ps/%.ps,$(DOC_TEX))
    1.10  DOC_PDF		:= $(patsubst src/%.tex,pdf/%.pdf,$(DOC_TEX))
    1.11  DOC_HTML	:= $(patsubst src/%.tex,html/%/index.html,$(DOC_TEX))
     2.1 --- a/docs/src/interface.tex	Tue Sep 20 09:08:26 2005 +0000
     2.2 +++ b/docs/src/interface.tex	Tue Sep 20 09:17:33 2005 +0000
     2.3 @@ -87,1084 +87,23 @@ itself, allows the Xen framework to sepa
     2.4  mechanism and policy within the system.
     2.5  
     2.6  
     2.7 -
     2.8 -\chapter{Virtual Architecture}
     2.9 -
    2.10 -On a Xen-based system, the hypervisor itself runs in {\it ring 0}.  It
    2.11 -has full access to the physical memory available in the system and is
    2.12 -responsible for allocating portions of it to the domains.  Guest
    2.13 -operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as
    2.14 -they see fit. Segmentation is used to prevent the guest OS from
    2.15 -accessing the portion of the address space that is reserved for
    2.16 -Xen. We expect most guest operating systems will use ring 1 for their
    2.17 -own operation and place applications in ring 3.
    2.18 -
    2.19 -In this chapter we consider the basic virtual architecture provided 
    2.20 -by Xen: the basic CPU state, exception and interrupt handling, and
    2.21 -time. Other aspects such as memory and device access are discussed 
    2.22 -in later chapters. 
    2.23 -
    2.24 -\section{CPU state}
    2.25 -
    2.26 -All privileged state must be handled by Xen.  The guest OS has no
    2.27 -direct access to CR3 and is not permitted to update privileged bits in
    2.28 -EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen; 
    2.29 -these are analogous to system calls but occur from ring 1 to ring 0. 
    2.30 -
    2.31 -A list of all hypercalls is given in Appendix~\ref{a:hypercalls}. 
    2.32 -
    2.33 -
    2.34 -
    2.35 -\section{Exceptions}
    2.36 -
    2.37 -A virtual IDT is provided --- a domain can submit a table of trap
    2.38 -handlers to Xen via the {\tt set\_trap\_table()} hypercall.  Most trap
    2.39 -handlers are identical to native x86 handlers, although the page-fault
    2.40 -handler is somewhat different.
    2.41 -
    2.42 -
    2.43 -\section{Interrupts and events}
    2.44 -
    2.45 -Interrupts are virtualized by mapping them to \emph{events}, which are
    2.46 -delivered asynchronously to the target domain using a callback
    2.47 -supplied via the {\tt set\_callbacks()} hypercall.  A guest OS can map
    2.48 -these events onto its standard interrupt dispatch mechanisms.  Xen is
    2.49 -responsible for determining the target domain that will handle each
    2.50 -physical interrupt source. For more details on the binding of event
    2.51 -sources to events, see Chapter~\ref{c:devices}. 
    2.52 -
    2.53 -
    2.54 -
    2.55 -\section{Time}
    2.56 -
    2.57 -Guest operating systems need to be aware of the passage of both real
    2.58 -(or wallclock) time and their own `virtual time' (the time for
    2.59 -which they have been executing). Furthermore, Xen has a notion of 
    2.60 -time which is used for scheduling. The following notions of 
    2.61 -time are provided: 
    2.62 -
    2.63 -\begin{description}
    2.64 -\item[Cycle counter time.]
    2.65 -
    2.66 -This provides a fine-grained time reference.  The cycle counter time is
    2.67 -used to accurately extrapolate the other time references.  On SMP machines
    2.68 -it is currently assumed that the cycle counter time is synchronized between
    2.69 -CPUs.  The current x86-based implementation achieves this within inter-CPU
    2.70 -communication latencies.
    2.71 -
    2.72 -\item[System time.]
    2.73 -
    2.74 -This is a 64-bit counter which holds the number of nanoseconds that
    2.75 -have elapsed since system boot.
    2.76 -
    2.77 -
    2.78 -\item[Wall clock time.]
    2.79 -
    2.80 -This is the time of day in a Unix-style {\tt struct timeval} (seconds
    2.81 -and microseconds since 1 January 1970, adjusted by leap seconds).  An
    2.82 -NTP client hosted by {\it domain 0} can keep this value accurate.  
    2.83 -
    2.84 -
    2.85 -\item[Domain virtual time.]
    2.86 -
    2.87 -This progresses at the same pace as system time, but only while a
    2.88 -domain is executing --- it stops while a domain is de-scheduled.
    2.89 -Therefore the share of the CPU that a domain receives is indicated by
    2.90 -the rate at which its virtual time increases.
    2.91 -
    2.92 -\end{description}
    2.93 -
    2.94 -
    2.95 -Xen exports timestamps for system time and wall-clock time to guest
    2.96 -operating systems through a shared page of memory.  Xen also provides
    2.97 -the cycle counter time at the instant the timestamps were calculated,
    2.98 -and the CPU frequency in Hertz.  This allows the guest to extrapolate
    2.99 -system and wall-clock times accurately based on the current cycle
   2.100 -counter time.
   2.101 -
   2.102 -Since all time stamps need to be updated and read \emph{atomically}
   2.103 -two version numbers are also stored in the shared info page. The 
   2.104 -first is incremented prior to an update, while the second is only
   2.105 -incremented afterwards. Thus a guest can be sure that it read a consistent 
   2.106 -state by checking the two version numbers are equal. 
   2.107 -
   2.108 -Xen includes a periodic ticker which sends a timer event to the
   2.109 -currently executing domain every 10ms.  The Xen scheduler also sends a
   2.110 -timer event whenever a domain is scheduled; this allows the guest OS
   2.111 -to adjust for the time that has passed while it has been inactive.  In
   2.112 -addition, Xen allows each domain to request that they receive a timer
   2.113 -event sent at a specified system time by using the {\tt
   2.114 -set\_timer\_op()} hypercall.  Guest OSes may use this timer to
   2.115 -implement timeout values when they block.
   2.116 -
   2.117 -
   2.118 -
   2.119 -%% % akw: demoting this to a section -- not sure if there is any point
   2.120 -%% % though, maybe just remove it.
   2.121 -
   2.122 -\section{Xen CPU Scheduling}
   2.123 -
   2.124 -Xen offers a uniform API for CPU schedulers.  It is possible to choose
   2.125 -from a number of schedulers at boot and it should be easy to add more.
   2.126 -The BVT, Atropos and Round Robin schedulers are part of the normal
   2.127 -Xen distribution.  BVT provides proportional fair shares of the CPU to
   2.128 -the running domains.  Atropos can be used to reserve absolute shares
   2.129 -of the CPU for each domain.  Round-robin is provided as an example of
   2.130 -Xen's internal scheduler API.
   2.131 -
   2.132 -\paragraph*{Note: SMP host support}
   2.133 -Xen has always supported SMP host systems.  Domains are statically assigned to
   2.134 -CPUs, either at creation time or when manually pinning to a particular CPU.
   2.135 -The current schedulers then run locally on each CPU to decide which of the
   2.136 -assigned domains should be run there. The user-level control software 
   2.137 -can be used to perform coarse-grain load-balancing between CPUs. 
   2.138 -
   2.139 -
   2.140 -%% More information on the characteristics and use of these schedulers is
   2.141 -%% available in {\tt Sched-HOWTO.txt}.
   2.142 -
   2.143 -
   2.144 -\section{Privileged operations}
   2.145 -
   2.146 -Xen exports an extended interface to privileged domains (viz.\ {\it
   2.147 -  Domain 0}). This allows such domains to build and boot other domains 
   2.148 -on the server, and provides control interfaces for managing 
   2.149 -scheduling, memory, networking, and block devices. 
   2.150 -
   2.151 -
   2.152 -\chapter{Memory}
   2.153 -\label{c:memory} 
   2.154 -
   2.155 -Xen is responsible for managing the allocation of physical memory to
   2.156 -domains, and for ensuring safe use of the paging and segmentation
   2.157 -hardware.
   2.158 -
   2.159 -
   2.160 -\section{Memory Allocation}
   2.161 -
   2.162 -
   2.163 -Xen resides within a small fixed portion of physical memory; it also
   2.164 -reserves the top 64MB of every virtual address space. The remaining
   2.165 -physical memory is available for allocation to domains at a page
   2.166 -granularity.  Xen tracks the ownership and use of each page, which
   2.167 -allows it to enforce secure partitioning between domains.
   2.168 -
   2.169 -Each domain has a maximum and current physical memory allocation. 
   2.170 -A guest OS may run a `balloon driver' to dynamically adjust its 
   2.171 -current memory allocation up to its limit. 
   2.172 -
   2.173 -
   2.174 -%% XXX SMH: I use machine and physical in the next section (which 
   2.175 -%% is kinda required for consistency with code); wonder if this 
   2.176 -%% section should use same terms? 
   2.177 -%%
   2.178 -%% Probably. 
   2.179 -%%
   2.180 -%% Merging this and below section at some point prob makes sense. 
   2.181 -
   2.182 -\section{Pseudo-Physical Memory}
   2.183 -
   2.184 -Since physical memory is allocated and freed on a page granularity,
   2.185 -there is no guarantee that a domain will receive a contiguous stretch
   2.186 -of physical memory. However most operating systems do not have good
   2.187 -support for operating in a fragmented physical address space. To aid
   2.188 -porting such operating systems to run on top of Xen, we make a
   2.189 -distinction between \emph{machine memory} and \emph{pseudo-physical
   2.190 -memory}.
   2.191 -
   2.192 -Put simply, machine memory refers to the entire amount of memory
   2.193 -installed in the machine, including that reserved by Xen, in use by
   2.194 -various domains, or currently unallocated. We consider machine memory
   2.195 -to comprise a set of 4K \emph{machine page frames} numbered
   2.196 -consecutively starting from 0. Machine frame numbers mean the same
   2.197 -within Xen or any domain.
   2.198 -
   2.199 -Pseudo-physical memory, on the other hand, is a per-domain
   2.200 -abstraction. It allows a guest operating system to consider its memory
   2.201 -allocation to consist of a contiguous range of physical page frames
   2.202 -starting at physical frame 0, despite the fact that the underlying
   2.203 -machine page frames may be sparsely allocated and in any order.
   2.204 -
   2.205 -To achieve this, Xen maintains a globally readable {\it
   2.206 -machine-to-physical} table which records the mapping from machine page
   2.207 -frames to pseudo-physical ones. In addition, each domain is supplied
   2.208 -with a {\it physical-to-machine} table which performs the inverse
   2.209 -mapping. Clearly the machine-to-physical table has size proportional
   2.210 -to the amount of RAM installed in the machine, while each
   2.211 -physical-to-machine table has size proportional to the memory
   2.212 -allocation of the given domain.
   2.213 -
   2.214 -Architecture dependent code in guest operating systems can then use
   2.215 -the two tables to provide the abstraction of pseudo-physical
   2.216 -memory. In general, only certain specialized parts of the operating
   2.217 -system (such as page table management) needs to understand the
   2.218 -difference between machine and pseudo-physical addresses.
   2.219 -
   2.220 -\section{Page Table Updates}
   2.221 -
   2.222 -In the default mode of operation, Xen enforces read-only access to
   2.223 -page tables and requires guest operating systems to explicitly request
   2.224 -any modifications.  Xen validates all such requests and only applies
   2.225 -updates that it deems safe.  This is necessary to prevent domains from
   2.226 -adding arbitrary mappings to their page tables.
   2.227 -
   2.228 -To aid validation, Xen associates a type and reference count with each
   2.229 -memory page. A page has one of the following
   2.230 -mutually-exclusive types at any point in time: page directory ({\sf
   2.231 -PD}), page table ({\sf PT}), local descriptor table ({\sf LDT}),
   2.232 -global descriptor table ({\sf GDT}), or writable ({\sf RW}). Note that
   2.233 -a guest OS may always create readable mappings of its own memory 
   2.234 -regardless of its current type. 
   2.235 -%%% XXX: possibly explain more about ref count 'lifecyle' here?
   2.236 -This mechanism is used to
   2.237 -maintain the invariants required for safety; for example, a domain
   2.238 -cannot have a writable mapping to any part of a page table as this
   2.239 -would require the page concerned to simultaneously be of types {\sf
   2.240 -  PT} and {\sf RW}.
   2.241 -
   2.242 -
   2.243 -%\section{Writable Page Tables}
   2.244 -
   2.245 -Xen also provides an alternative mode of operation in which guests be
   2.246 -have the illusion that their page tables are directly writable.  Of
   2.247 -course this is not really the case, since Xen must still validate
   2.248 -modifications to ensure secure partitioning. To this end, Xen traps
   2.249 -any write attempt to a memory page of type {\sf PT} (i.e., that is
   2.250 -currently part of a page table).  If such an access occurs, Xen
   2.251 -temporarily allows write access to that page while at the same time
   2.252 -{\em disconnecting} it from the page table that is currently in
   2.253 -use. This allows the guest to safely make updates to the page because
   2.254 -the newly-updated entries cannot be used by the MMU until Xen
   2.255 -revalidates and reconnects the page.
   2.256 -Reconnection occurs automatically in a number of situations: for
   2.257 -example, when the guest modifies a different page-table page, when the
   2.258 -domain is preempted, or whenever the guest uses Xen's explicit
   2.259 -page-table update interfaces.
   2.260 -
   2.261 -Finally, Xen also supports a form of \emph{shadow page tables} in
   2.262 -which the guest OS uses a independent copy of page tables which are
   2.263 -unknown to the hardware (i.e.\ which are never pointed to by {\tt
   2.264 -cr3}). Instead Xen propagates changes made to the guest's tables to the
   2.265 -real ones, and vice versa. This is useful for logging page writes
   2.266 -(e.g.\ for live migration or checkpoint). A full version of the shadow
   2.267 -page tables also allows guest OS porting with less effort.
   2.268 -
   2.269 -\section{Segment Descriptor Tables}
   2.270 +%% chapter Virtual Architecture moved to architecture.tex
   2.271 +\include{src/interface/architecture}
   2.272  
   2.273 -On boot a guest is supplied with a default GDT, which does not reside
   2.274 -within its own memory allocation.  If the guest wishes to use other
   2.275 -than the default `flat' ring-1 and ring-3 segments that this GDT
   2.276 -provides, it must register a custom GDT and/or LDT with Xen,
   2.277 -allocated from its own memory. Note that a number of GDT 
   2.278 -entries are reserved by Xen -- any custom GDT must also include
   2.279 -sufficient space for these entries. 
   2.280 -
   2.281 -For example, the following hypercall is used to specify a new GDT: 
   2.282 -
   2.283 -\begin{quote}
   2.284 -int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em entries})
   2.285 -
   2.286 -{\em frame\_list}: An array of up to 16 machine page frames within
   2.287 -which the GDT resides.  Any frame registered as a GDT frame may only
   2.288 -be mapped read-only within the guest's address space (e.g., no
   2.289 -writable mappings, no use as a page-table page, and so on).
   2.290 -
   2.291 -{\em entries}: The number of descriptor-entry slots in the GDT.  Note
   2.292 -that the table must be large enough to contain Xen's reserved entries;
   2.293 -thus we must have `{\em entries $>$ LAST\_RESERVED\_GDT\_ENTRY}\ '.
   2.294 -Note also that, after registering the GDT, slots {\em FIRST\_} through
   2.295 -{\em LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest and
   2.296 -may be overwritten by Xen.
   2.297 -\end{quote}
   2.298 -
   2.299 -The LDT is updated via the generic MMU update mechanism (i.e., via 
   2.300 -the {\tt mmu\_update()} hypercall. 
   2.301 -
   2.302 -\section{Start of Day} 
   2.303 -
   2.304 -The start-of-day environment for guest operating systems is rather
   2.305 -different to that provided by the underlying hardware. In particular,
   2.306 -the processor is already executing in protected mode with paging
   2.307 -enabled.
   2.308 -
   2.309 -{\it Domain 0} is created and booted by Xen itself. For all subsequent
   2.310 -domains, the analogue of the boot-loader is the {\it domain builder},
   2.311 -user-space software running in {\it domain 0}. The domain builder 
   2.312 -is responsible for building the initial page tables for a domain  
   2.313 -and loading its kernel image at the appropriate virtual address. 
   2.314 -
   2.315 -
   2.316 -
   2.317 -\chapter{Devices}
   2.318 -\label{c:devices}
   2.319 -
   2.320 -Devices such as network and disk are exported to guests using a
   2.321 -split device driver.  The device driver domain, which accesses the
   2.322 -physical device directly also runs a {\em backend} driver, serving
   2.323 -requests to that device from guests.  Each guest will use a simple
   2.324 -{\em frontend} driver, to access the backend.  Communication between these
   2.325 -domains is composed of two parts:  First, data is placed onto a shared
   2.326 -memory page between the domains.  Second, an event channel between the
   2.327 -two domains is used to pass notification that data is outstanding.
   2.328 -This separation of notification from data transfer allows message
   2.329 -batching, and results in very efficient device access.  
   2.330 -
   2.331 -Event channels are used extensively in device virtualization; each
   2.332 -domain has a number of end-points or \emph{ports} each of which
   2.333 -may be bound to one of the following \emph{event sources}:
   2.334 -\begin{itemize} 
   2.335 -  \item a physical interrupt from a real device, 
   2.336 -  \item a virtual interrupt (callback) from Xen, or 
   2.337 -  \item a signal from another domain 
   2.338 -\end{itemize}
   2.339 -
   2.340 -Events are lightweight and do not carry much information beyond 
   2.341 -the source of the notification. Hence when performing bulk data
   2.342 -transfer, events are typically used as synchronization primitives
   2.343 -over a shared memory transport. Event channels are managed via 
   2.344 -the {\tt event\_channel\_op()} hypercall; for more details see
   2.345 -Section~\ref{s:idc}. 
   2.346 -
   2.347 -This chapter focuses on some individual device interfaces
   2.348 -available to Xen guests. 
   2.349 -
   2.350 -\section{Network I/O}
   2.351 -
   2.352 -Virtual network device services are provided by shared memory
   2.353 -communication with a backend domain.  From the point of view of
   2.354 -other domains, the backend may be viewed as a virtual ethernet switch
   2.355 -element with each domain having one or more virtual network interfaces
   2.356 -connected to it.
   2.357 -
   2.358 -\subsection{Backend Packet Handling}
   2.359 -
   2.360 -The backend driver is responsible for a variety of actions relating to
   2.361 -the transmission and reception of packets from the physical device.
   2.362 -With regard to transmission, the backend performs these key actions:
   2.363 -
   2.364 -\begin{itemize}
   2.365 -\item {\bf Validation:} To ensure that domains do not attempt to
   2.366 -  generate invalid (e.g. spoofed) traffic, the backend driver may
   2.367 -  validate headers ensuring that source MAC and IP addresses match the
   2.368 -  interface that they have been sent from.
   2.369 -
   2.370 -  Validation functions can be configured using standard firewall rules
   2.371 -  ({\small{\tt iptables}} in the case of Linux).
   2.372 -  
   2.373 -\item {\bf Scheduling:} Since a number of domains can share a single
   2.374 -  physical network interface, the backend must mediate access when
   2.375 -  several domains each have packets queued for transmission.  This
   2.376 -  general scheduling function subsumes basic shaping or rate-limiting
   2.377 -  schemes.
   2.378 -  
   2.379 -\item {\bf Logging and Accounting:} The backend domain can be
   2.380 -  configured with classifier rules that control how packets are
   2.381 -  accounted or logged.  For example, log messages might be generated
   2.382 -  whenever a domain attempts to send a TCP packet containing a SYN.
   2.383 -\end{itemize}
   2.384 -
   2.385 -On receipt of incoming packets, the backend acts as a simple
   2.386 -demultiplexer:  Packets are passed to the appropriate virtual
   2.387 -interface after any necessary logging and accounting have been carried
   2.388 -out.
   2.389 -
   2.390 -\subsection{Data Transfer}
   2.391 -
   2.392 -Each virtual interface uses two ``descriptor rings'', one for transmit,
   2.393 -the other for receive.  Each descriptor identifies a block of contiguous
   2.394 -physical memory allocated to the domain.  
   2.395 -
   2.396 -The transmit ring carries packets to transmit from the guest to the
   2.397 -backend domain.  The return path of the transmit ring carries messages
   2.398 -indicating that the contents have been physically transmitted and the
   2.399 -backend no longer requires the associated pages of memory.
   2.400 +%% chapter Memory moved to memory.tex
   2.401 +\include{src/interface/memory}
   2.402  
   2.403 -To receive packets, the guest places descriptors of unused pages on
   2.404 -the receive ring.  The backend will return received packets by
   2.405 -exchanging these pages in the domain's memory with new pages
   2.406 -containing the received data, and passing back descriptors regarding
   2.407 -the new packets on the ring.  This zero-copy approach allows the
   2.408 -backend to maintain a pool of free pages to receive packets into, and
   2.409 -then deliver them to appropriate domains after examining their
   2.410 -headers.
   2.411 -
   2.412 -%
   2.413 -%Real physical addresses are used throughout, with the domain performing 
   2.414 -%translation from pseudo-physical addresses if that is necessary.
   2.415 -
   2.416 -If a domain does not keep its receive ring stocked with empty buffers then 
   2.417 -packets destined to it may be dropped.  This provides some defence against 
   2.418 -receive livelock problems because an overload domain will cease to receive
   2.419 -further data.  Similarly, on the transmit path, it provides the application
   2.420 -with feedback on the rate at which packets are able to leave the system.
   2.421 -
   2.422 -
   2.423 -Flow control on rings is achieved by including a pair of producer
   2.424 -indexes on the shared ring page.  Each side will maintain a private
   2.425 -consumer index indicating the next outstanding message.  In this
   2.426 -manner, the domains cooperate to divide the ring into two message
   2.427 -lists, one in each direction.  Notification is decoupled from the
   2.428 -immediate placement of new messages on the ring; the event channel
   2.429 -will be used to generate notification when {\em either} a certain
   2.430 -number of outstanding messages are queued, {\em or} a specified number
   2.431 -of nanoseconds have elapsed since the oldest message was placed on the
   2.432 -ring.
   2.433 -
   2.434 -% Not sure if my version is any better -- here is what was here before:
   2.435 -%% Synchronization between the backend domain and the guest is achieved using 
   2.436 -%% counters held in shared memory that is accessible to both.  Each ring has
   2.437 -%% associated producer and consumer indices indicating the area in the ring
   2.438 -%% that holds descriptors that contain data.  After receiving {\it n} packets
   2.439 -%% or {\t nanoseconds} after receiving the first packet, the hypervisor sends
   2.440 -%% an event to the domain. 
   2.441 -
   2.442 -\section{Block I/O}
   2.443 -
   2.444 -All guest OS disk access goes through the virtual block device VBD
   2.445 -interface.  This interface allows domains access to portions of block
   2.446 -storage devices visible to the the block backend device.  The VBD
   2.447 -interface is a split driver, similar to the network interface
   2.448 -described above.  A single shared memory ring is used between the
   2.449 -frontend and backend drivers, across which read and write messages are
   2.450 -sent.
   2.451 -
   2.452 -Any block device accessible to the backend domain, including
   2.453 -network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices,
   2.454 -can be exported as a VBD.  Each VBD is mapped to a device node in the
   2.455 -guest, specified in the guest's startup configuration.
   2.456 -
   2.457 -Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since
   2.458 -similar functionality can be achieved using the more complete LVM
   2.459 -system, which is already in widespread use.
   2.460 -
   2.461 -\subsection{Data Transfer}
   2.462 -
   2.463 -The single ring between the guest and the block backend supports three
   2.464 -messages:
   2.465 -
   2.466 -\begin{description}
   2.467 -\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to this guest
   2.468 -  from the backend.  The request includes a descriptor of a free page
   2.469 -  into which the reply will be written by the backend.
   2.470 +%% chapter Devices moved to devices.tex
   2.471 +\include{src/interface/devices}
   2.472  
   2.473 -\item [{\small {\tt READ}}:] Read data from the specified block device.  The
   2.474 -  front end identifies the device and location to read from and
   2.475 -  attaches pages for the data to be copied to (typically via DMA from
   2.476 -  the device).  The backend acknowledges completed read requests as
   2.477 -  they finish.
   2.478 -
   2.479 -\item [{\small {\tt WRITE}}:] Write data to the specified block device.  This
   2.480 -  functions essentially as {\small {\tt READ}}, except that the data moves to
   2.481 -  the device instead of from it.
   2.482 -\end{description}
   2.483 -
   2.484 -% um... some old text
   2.485 -%% In overview, the same style of descriptor-ring that is used for
   2.486 -%% network packets is used here.  Each domain has one ring that carries
   2.487 -%% operation requests to the hypervisor and carries the results back
   2.488 -%% again.
   2.489 -
   2.490 -%% Rather than copying data, the backend simply maps the domain's buffers
   2.491 -%% in order to enable direct DMA to them.  The act of mapping the buffers
   2.492 -%% also increases the reference counts of the underlying pages, so that
   2.493 -%% the unprivileged domain cannot try to return them to the hypervisor,
   2.494 -%% install them as page tables, or any other unsafe behaviour.
   2.495 -%% %block API here 
   2.496 -
   2.497 -
   2.498 -\chapter{Further Information} 
   2.499 -
   2.500 -
   2.501 -If you have questions that are not answered by this manual, the
   2.502 -sources of information listed below may be of interest to you.  Note
   2.503 -that bug reports, suggestions and contributions related to the
   2.504 -software (or the documentation) should be sent to the Xen developers'
   2.505 -mailing list (address below).
   2.506 -
   2.507 -\section{Other documentation}
   2.508 -
   2.509 -If you are mainly interested in using (rather than developing for)
   2.510 -Xen, the {\em Xen Users' Manual} is distributed in the {\tt docs/}
   2.511 -directory of the Xen source distribution.  
   2.512 -
   2.513 -% Various HOWTOs are also available in {\tt docs/HOWTOS}.
   2.514 -
   2.515 -\section{Online references}
   2.516 -
   2.517 -The official Xen web site is found at:
   2.518 -\begin{quote}
   2.519 -{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/}
   2.520 -\end{quote}
   2.521 -
   2.522 -This contains links to the latest versions of all on-line 
   2.523 -documentation. 
   2.524 -
   2.525 -\section{Mailing lists}
   2.526 -
   2.527 -There are currently four official Xen mailing lists:
   2.528 -
   2.529 -\begin{description}
   2.530 -\item[xen-devel@lists.xensource.com] Used for development
   2.531 -discussions and bug reports.  Subscribe at: \\
   2.532 -{\small {\tt http://lists.xensource.com/xen-devel}}
   2.533 -\item[xen-users@lists.xensource.com] Used for installation and usage
   2.534 -discussions and requests for help.  Subscribe at: \\
   2.535 -{\small {\tt http://lists.xensource.com/xen-users}}
   2.536 -\item[xen-announce@lists.xensource.com] Used for announcements only.
   2.537 -Subscribe at: \\
   2.538 -{\small {\tt http://lists.xensource.com/xen-announce}}
   2.539 -\item[xen-changelog@lists.xensource.com]  Changelog feed
   2.540 -from the unstable and 2.0 trees - developer oriented.  Subscribe at: \\
   2.541 -{\small {\tt http://lists.xensource.com/xen-changelog}}
   2.542 -\end{description}
   2.543 -
   2.544 -Of these, xen-devel is the most active.
   2.545 -
   2.546 -
   2.547 +%% chapter Further Information moved to further_info.tex
   2.548 +\include{src/interface/further_info}
   2.549  
   2.550  
   2.551  \appendix
   2.552  
   2.553 -%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}}
   2.554 -
   2.555 -
   2.556 -
   2.557 -
   2.558 -
   2.559 -\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
   2.560 -
   2.561 -
   2.562 -
   2.563 -
   2.564 -
   2.565 -
   2.566 -\chapter{Xen Hypercalls}
   2.567 -\label{a:hypercalls}
   2.568 -
   2.569 -Hypercalls represent the procedural interface to Xen; this appendix 
   2.570 -categorizes and describes the current set of hypercalls. 
   2.571 -
   2.572 -\section{Invoking Hypercalls} 
   2.573 -
   2.574 -Hypercalls are invoked in a manner analogous to system calls in a
   2.575 -conventional operating system; a software interrupt is issued which
   2.576 -vectors to an entry point within Xen. On x86\_32 machines the
   2.577 -instruction required is {\tt int \$82}; the (real) IDT is setup so
   2.578 -that this may only be issued from within ring 1. The particular 
   2.579 -hypercall to be invoked is contained in {\tt EAX} --- a list 
   2.580 -mapping these values to symbolic hypercall names can be found 
   2.581 -in {\tt xen/include/public/xen.h}. 
   2.582 -
   2.583 -On some occasions a set of hypercalls will be required to carry
   2.584 -out a higher-level function; a good example is when a guest 
   2.585 -operating wishes to context switch to a new process which 
   2.586 -requires updating various privileged CPU state. As an optimization
   2.587 -for these cases, there is a generic mechanism to issue a set of 
   2.588 -hypercalls as a batch: 
   2.589 -
   2.590 -\begin{quote}
   2.591 -\hypercall{multicall(void *call\_list, int nr\_calls)}
   2.592 -
   2.593 -Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
   2.594 -the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
   2.595 -call\_list}. Each entry contains the hypercall operation code followed
   2.596 -by up to 7 word-sized arguments.
   2.597 -\end{quote}
   2.598 -
   2.599 -Note that multicalls are provided purely as an optimization; there is
   2.600 -no requirement to use them when first porting a guest operating
   2.601 -system.
   2.602 -
   2.603 -
   2.604 -\section{Virtual CPU Setup} 
   2.605 -
   2.606 -At start of day, a guest operating system needs to setup the virtual
   2.607 -CPU it is executing on. This includes installing vectors for the
   2.608 -virtual IDT so that the guest OS can handle interrupts, page faults,
   2.609 -etc. However the very first thing a guest OS must setup is a pair 
   2.610 -of hypervisor callbacks: these are the entry points which Xen will
   2.611 -use when it wishes to notify the guest OS of an occurrence. 
   2.612 -
   2.613 -\begin{quote}
   2.614 -\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
   2.615 -  event\_address, unsigned long failsafe\_selector, unsigned long
   2.616 -  failsafe\_address) }
   2.617 -
   2.618 -Register the normal (``event'') and failsafe callbacks for 
   2.619 -event processing. In each case the code segment selector and 
   2.620 -address within that segment are provided. The selectors must
   2.621 -have RPL 1; in XenLinux we simply use the kernel's CS for both 
   2.622 -{\tt event\_selector} and {\tt failsafe\_selector}.
   2.623 -
   2.624 -The value {\tt event\_address} specifies the address of the guest OSes
   2.625 -event handling and dispatch routine; the {\tt failsafe\_address}
   2.626 -specifies a separate entry point which is used only if a fault occurs
   2.627 -when Xen attempts to use the normal callback. 
   2.628 -\end{quote} 
   2.629 -
   2.630 -
   2.631 -After installing the hypervisor callbacks, the guest OS can 
   2.632 -install a `virtual IDT' by using the following hypercall: 
   2.633 -
   2.634 -\begin{quote} 
   2.635 -\hypercall{set\_trap\_table(trap\_info\_t *table)} 
   2.636 -
   2.637 -Install one or more entries into the per-domain 
   2.638 -trap handler table (essentially a software version of the IDT). 
   2.639 -Each entry in the array pointed to by {\tt table} includes the 
   2.640 -exception vector number with the corresponding segment selector 
   2.641 -and entry point. Most guest OSes can use the same handlers on 
   2.642 -Xen as when running on the real hardware; an exception is the 
   2.643 -page fault handler (exception vector 14) where a modified 
   2.644 -stack-frame layout is used. 
   2.645 -
   2.646 -
   2.647 -\end{quote} 
   2.648 -
   2.649 -
   2.650 -
   2.651 -\section{Scheduling and Timer}
   2.652 -
   2.653 -Domains are preemptively scheduled by Xen according to the 
   2.654 -parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 
   2.655 -In addition, however, a domain may choose to explicitly 
   2.656 -control certain behavior with the following hypercall: 
   2.657 -
   2.658 -\begin{quote} 
   2.659 -\hypercall{sched\_op(unsigned long op)} 
   2.660 -
   2.661 -Request scheduling operation from hypervisor. The options are: {\it
   2.662 -yield}, {\it block}, and {\it shutdown}.  {\it yield} keeps the
   2.663 -calling domain runnable but may cause a reschedule if other domains
   2.664 -are runnable.  {\it block} removes the calling domain from the run
   2.665 -queue and cause is to sleeps until an event is delivered to it.  {\it
   2.666 -shutdown} is used to end the domain's execution; the caller can
   2.667 -additionally specify whether the domain should reboot, halt or
   2.668 -suspend.
   2.669 -\end{quote} 
   2.670 -
   2.671 -To aid the implementation of a process scheduler within a guest OS,
   2.672 -Xen provides a virtual programmable timer:
   2.673 -
   2.674 -\begin{quote}
   2.675 -\hypercall{set\_timer\_op(uint64\_t timeout)} 
   2.676 -
   2.677 -Request a timer event to be sent at the specified system time (time 
   2.678 -in nanoseconds since system boot). The hypercall actually passes the 
   2.679 -64-bit timeout value as a pair of 32-bit values. 
   2.680 -
   2.681 -\end{quote} 
   2.682 -
   2.683 -Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 
   2.684 -allows block-with-timeout semantics. 
   2.685 -
   2.686 -
   2.687 -\section{Page Table Management} 
   2.688 -
   2.689 -Since guest operating systems have read-only access to their page 
   2.690 -tables, Xen must be involved when making any changes. The following
   2.691 -multi-purpose hypercall can be used to modify page-table entries, 
   2.692 -update the machine-to-physical mapping table, flush the TLB, install 
   2.693 -a new page-table base pointer, and more.
   2.694 -
   2.695 -\begin{quote} 
   2.696 -\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 
   2.697 -
   2.698 -Update the page table for the domain; a set of {\tt count} updates are
   2.699 -submitted for processing in a batch, with {\tt success\_count} being 
   2.700 -updated to report the number of successful updates.  
   2.701 -
   2.702 -Each element of {\tt req[]} contains a pointer (address) and value; 
   2.703 -the least significant 2-bits of the pointer are used to distinguish 
   2.704 -the type of update requested as follows:
   2.705 -\begin{description} 
   2.706 -
   2.707 -\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
   2.708 -page table entry to the associated value; Xen will check that the
   2.709 -update is safe, as described in Chapter~\ref{c:memory}.
   2.710 -
   2.711 -\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
   2.712 -  machine-to-physical table. The calling domain must own the machine
   2.713 -  page in question (or be privileged).
   2.714 -
   2.715 -\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
   2.716 -The set of additional MMU operations is considerable, and includes
   2.717 -updating {\tt cr3} (or just re-installing it for a TLB flush),
   2.718 -flushing the cache, installing a new LDT, or pinning \& unpinning
   2.719 -page-table pages (to ensure their reference count doesn't drop to zero
   2.720 -which would require a revalidation of all entries).
   2.721 -
   2.722 -Further extended commands are used to deal with granting and 
   2.723 -acquiring page ownership; see Section~\ref{s:idc}. 
   2.724 -
   2.725 -
   2.726 -\end{description}
   2.727 -
   2.728 -More details on the precise format of all commands can be 
   2.729 -found in {\tt xen/include/public/xen.h}. 
   2.730 -
   2.731 -
   2.732 -\end{quote}
   2.733 -
   2.734 -Explicitly updating batches of page table entries is extremely
   2.735 -efficient, but can require a number of alterations to the guest
   2.736 -OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
   2.737 -recommended for new OS ports.
   2.738 -
   2.739 -Regardless of which page table update mode is being used, however,
   2.740 -there are some occasions (notably handling a demand page fault) where
   2.741 -a guest OS will wish to modify exactly one PTE rather than a
   2.742 -batch. This is catered for by the following:
   2.743 -
   2.744 -\begin{quote} 
   2.745 -\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
   2.746 -val, \\ unsigned long flags)}
   2.747 -
   2.748 -Update the currently installed PTE for the page {\tt page\_nr} to 
   2.749 -{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 
   2.750 -is safe before applying it. The {\tt flags} determine which kind
   2.751 -of TLB flush, if any, should follow the update. 
   2.752 -
   2.753 -\end{quote} 
   2.754 -
   2.755 -Finally, sufficiently privileged domains may occasionally wish to manipulate 
   2.756 -the pages of others: 
   2.757 -\begin{quote}
   2.758 -
   2.759 -\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
   2.760 -unsigned long val, unsigned long flags, uint16\_t domid)}
   2.761 -
   2.762 -Identical to {\tt update\_va\_mapping()} save that the pages being
   2.763 -mapped must belong to the domain {\tt domid}. 
   2.764 -
   2.765 -\end{quote}
   2.766 -
   2.767 -This privileged operation is currently used by backend virtual device
   2.768 -drivers to safely map pages containing I/O data. 
   2.769 -
   2.770 -
   2.771 -
   2.772 -\section{Segmentation Support}
   2.773 -
   2.774 -Xen allows guest OSes to install a custom GDT if they require it; 
   2.775 -this is context switched transparently whenever a domain is 
   2.776 -[de]scheduled.  The following hypercall is effectively a 
   2.777 -`safe' version of {\tt lgdt}: 
   2.778 -
   2.779 -\begin{quote}
   2.780 -\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 
   2.781 -
   2.782 -Install a global descriptor table for a domain; {\tt frame\_list} is
   2.783 -an array of up to 16 machine page frames within which the GDT resides,
   2.784 -with {\tt entries} being the actual number of descriptor-entry
   2.785 -slots. All page frames must be mapped read-only within the guest's
   2.786 -address space, and the table must be large enough to contain Xen's
   2.787 -reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
   2.788 -
   2.789 -\end{quote}
   2.790 -
   2.791 -Many guest OSes will also wish to install LDTs; this is achieved by
   2.792 -using {\tt mmu\_update()} with an extended command, passing the
   2.793 -linear address of the LDT base along with the number of entries. No
   2.794 -special safety checks are required; Xen needs to perform this task
   2.795 -simply since {\tt lldt} requires CPL 0.
   2.796 -
   2.797 -
   2.798 -Xen also allows guest operating systems to update just an 
   2.799 -individual segment descriptor in the GDT or LDT:  
   2.800 -
   2.801 -\begin{quote}
   2.802 -\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
   2.803 -unsigned long word2)}
   2.804 -
   2.805 -Update the GDT/LDT entry at machine address {\tt ma}; the new
   2.806 -8-byte descriptor is stored in {\tt word1} and {\tt word2}.
   2.807 -Xen performs a number of checks to ensure the descriptor is 
   2.808 -valid. 
   2.809 -
   2.810 -\end{quote}
   2.811 -
   2.812 -Guest OSes can use the above in place of context switching entire 
   2.813 -LDTs (or the GDT) when the number of changing descriptors is small. 
   2.814 -
   2.815 -\section{Context Switching} 
   2.816 -
   2.817 -When a guest OS wishes to context switch between two processes, 
   2.818 -it can use the page table and segmentation hypercalls described
   2.819 -above to perform the the bulk of the privileged work. In addition, 
   2.820 -however, it will need to invoke Xen to switch the kernel (ring 1) 
   2.821 -stack pointer: 
   2.822 -
   2.823 -\begin{quote} 
   2.824 -\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 
   2.825 -
   2.826 -Request kernel stack switch from hypervisor; {\tt ss} is the new 
   2.827 -stack segment, which {\tt esp} is the new stack pointer. 
   2.828 -
   2.829 -\end{quote} 
   2.830 -
   2.831 -A final useful hypercall for context switching allows ``lazy'' 
   2.832 -save and restore of floating point state: 
   2.833 -
   2.834 -\begin{quote}
   2.835 -\hypercall{fpu\_taskswitch(void)} 
   2.836 -
   2.837 -This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
   2.838 -control register; this means that the next attempt to use floating
   2.839 -point will cause a trap which the guest OS can trap. Typically it will
   2.840 -then save/restore the FP state, and clear the {\tt TS} bit. 
   2.841 -\end{quote} 
   2.842 -
   2.843 -This is provided as an optimization only; guest OSes can also choose
   2.844 -to save and restore FP state on all context switches for simplicity. 
   2.845 -
   2.846 -
   2.847 -\section{Physical Memory Management}
   2.848 -
   2.849 -As mentioned previously, each domain has a maximum and current 
   2.850 -memory allocation. The maximum allocation, set at domain creation 
   2.851 -time, cannot be modified. However a domain can choose to reduce 
   2.852 -and subsequently grow its current allocation by using the
   2.853 -following call: 
   2.854 -
   2.855 -\begin{quote} 
   2.856 -\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
   2.857 -  unsigned long nr\_extents, unsigned int extent\_order)}
   2.858 -
   2.859 -Increase or decrease current memory allocation (as determined by 
   2.860 -the value of {\tt op}). Each invocation provides a list of 
   2.861 -extents each of which is $2^s$ pages in size, 
   2.862 -where $s$ is the value of {\tt extent\_order}. 
   2.863 -
   2.864 -\end{quote} 
   2.865 -
   2.866 -In addition to simply reducing or increasing the current memory
   2.867 -allocation via a `balloon driver', this call is also useful for 
   2.868 -obtaining contiguous regions of machine memory when required (e.g. 
   2.869 -for certain PCI devices, or if using superpages).  
   2.870 -
   2.871 -
   2.872 -\section{Inter-Domain Communication}
   2.873 -\label{s:idc} 
   2.874 -
   2.875 -Xen provides a simple asynchronous notification mechanism via
   2.876 -\emph{event channels}. Each domain has a set of end-points (or
   2.877 -\emph{ports}) which may be bound to an event source (e.g. a physical
   2.878 -IRQ, a virtual IRQ, or an port in another domain). When a pair of
   2.879 -end-points in two different domains are bound together, then a `send'
   2.880 -operation on one will cause an event to be received by the destination
   2.881 -domain.
   2.882 -
   2.883 -The control and use of event channels involves the following hypercall: 
   2.884 -
   2.885 -\begin{quote}
   2.886 -\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 
   2.887 -
   2.888 -Inter-domain event-channel management; {\tt op} is a discriminated 
   2.889 -union which allows the following 7 operations: 
   2.890 -
   2.891 -\begin{description} 
   2.892 -
   2.893 -\item[\it alloc\_unbound:] allocate a free (unbound) local
   2.894 -  port and prepare for connection from a specified domain. 
   2.895 -\item[\it bind\_virq:] bind a local port to a virtual 
   2.896 -IRQ; any particular VIRQ can be bound to at most one port per domain. 
   2.897 -\item[\it bind\_pirq:] bind a local port to a physical IRQ;
   2.898 -once more, a given pIRQ can be bound to at most one port per
   2.899 -domain. Furthermore the calling domain must be sufficiently
   2.900 -privileged.
   2.901 -\item[\it bind\_interdomain:] construct an interdomain event 
   2.902 -channel; in general, the target domain must have previously allocated 
   2.903 -an unbound port for this channel, although this can be bypassed by 
   2.904 -privileged domains during domain setup. 
   2.905 -\item[\it close:] close an interdomain event channel. 
   2.906 -\item[\it send:] send an event to the remote end of a 
   2.907 -interdomain event channel. 
   2.908 -\item[\it status:] determine the current status of a local port. 
   2.909 -\end{description} 
   2.910 -
   2.911 -For more details see
   2.912 -{\tt xen/include/public/event\_channel.h}. 
   2.913 -
   2.914 -\end{quote} 
   2.915 -
   2.916 -Event channels are the fundamental communication primitive between 
   2.917 -Xen domains and seamlessly support SMP. However they provide little
   2.918 -bandwidth for communication {\sl per se}, and hence are typically 
   2.919 -married with a piece of shared memory to produce effective and 
   2.920 -high-performance inter-domain communication. 
   2.921 -
   2.922 -Safe sharing of memory pages between guest OSes is carried out by
   2.923 -granting access on a per page basis to individual domains. This is
   2.924 -achieved by using the {\tt grant\_table\_op()} hypercall.
   2.925 -
   2.926 -\begin{quote}
   2.927 -\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
   2.928 -
   2.929 -Grant or remove access to a particular page to a particular domain. 
   2.930 -
   2.931 -\end{quote} 
   2.932 -
   2.933 -This is not currently widely in use by guest operating systems, but 
   2.934 -we intend to integrate support more fully in the near future. 
   2.935 -
   2.936 -\section{PCI Configuration} 
   2.937 -
   2.938 -Domains with physical device access (i.e.\ driver domains) receive
   2.939 -limited access to certain PCI devices (bus address space and
   2.940 -interrupts). However many guest operating systems attempt to 
   2.941 -determine the PCI configuration by directly access the PCI BIOS, 
   2.942 -which cannot be allowed for safety. 
   2.943 -
   2.944 -Instead, Xen provides the following hypercall: 
   2.945 -
   2.946 -\begin{quote}
   2.947 -\hypercall{physdev\_op(void *physdev\_op)}
   2.948 -
   2.949 -Perform a PCI configuration option; depending on the value 
   2.950 -of {\tt physdev\_op} this can be a PCI config read, a PCI config 
   2.951 -write, or a small number of other queries. 
   2.952 -
   2.953 -\end{quote} 
   2.954 -
   2.955 -
   2.956 -For examples of using {\tt physdev\_op()}, see the 
   2.957 -Xen-specific PCI code in the linux sparse tree. 
   2.958 -
   2.959 -\section{Administrative Operations}
   2.960 -\label{s:dom0ops}
   2.961 -
   2.962 -A large number of control operations are available to a sufficiently
   2.963 -privileged domain (typically domain 0). These allow the creation and
   2.964 -management of new domains, for example. A complete list is given 
   2.965 -below: for more details on any or all of these, please see 
   2.966 -{\tt xen/include/public/dom0\_ops.h} 
   2.967 -
   2.968 -
   2.969 -\begin{quote}
   2.970 -\hypercall{dom0\_op(dom0\_op\_t *op)} 
   2.971 -
   2.972 -Administrative domain operations for domain management. The options are:
   2.973 -
   2.974 -\begin{description} 
   2.975 -\item [\it DOM0\_CREATEDOMAIN:] create a new domain
   2.976 -
   2.977 -\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 
   2.978 -queue. 
   2.979 -
   2.980 -\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
   2.981 -  once again. 
   2.982 -
   2.983 -\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
   2.984 -with a domain
   2.985 -
   2.986 -\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
   2.987 -
   2.988 -\item [\it DOM0\_SCHEDCTL:]
   2.989 -
   2.990 -\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
   2.991 -
   2.992 -\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
   2.993 -
   2.994 -\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
   2.995 -
   2.996 -\item [\it DOM0\_GETPAGEFRAMEINFO:] 
   2.997 -
   2.998 -\item [\it DOM0\_GETPAGEFRAMEINFO2:]
   2.999 -
  2.1000 -\item [\it DOM0\_IOPL:] set I/O privilege level
  2.1001 -
  2.1002 -\item [\it DOM0\_MSR:] read or write model specific registers
  2.1003 -
  2.1004 -\item [\it DOM0\_DEBUG:] interactively invoke the debugger
  2.1005 -
  2.1006 -\item [\it DOM0\_SETTIME:] set system time
  2.1007 -
  2.1008 -\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
  2.1009 -
  2.1010 -\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
  2.1011 -
  2.1012 -\item [\it DOM0\_GETTBUFS:] get information about the size and location of
  2.1013 -                      the trace buffers (only on trace-buffer enabled builds)
  2.1014 -
  2.1015 -\item [\it DOM0\_PHYSINFO:] get information about the host machine
  2.1016 -
  2.1017 -\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
  2.1018 -
  2.1019 -\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
  2.1020 -
  2.1021 -\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
  2.1022 -
  2.1023 -\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
  2.1024 -
  2.1025 -\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
  2.1026 -
  2.1027 -\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
  2.1028 -\end{description} 
  2.1029 -\end{quote} 
  2.1030 -
  2.1031 -Most of the above are best understood by looking at the code 
  2.1032 -implementing them (in {\tt xen/common/dom0\_ops.c}) and in 
  2.1033 -the user-space tools that use them (mostly in {\tt tools/libxc}). 
  2.1034 -
  2.1035 -\section{Debugging Hypercalls} 
  2.1036 -
  2.1037 -A few additional hypercalls are mainly useful for debugging: 
  2.1038 -
  2.1039 -\begin{quote} 
  2.1040 -\hypercall{console\_io(int cmd, int count, char *str)}
  2.1041 -
  2.1042 -Use Xen to interact with the console; operations are:
  2.1043 -
  2.1044 -{\it CONSOLEIO\_write}: Output count characters from buffer str.
  2.1045 -
  2.1046 -{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
  2.1047 -\end{quote} 
  2.1048 -
  2.1049 -A pair of hypercalls allows access to the underlying debug registers: 
  2.1050 -\begin{quote}
  2.1051 -\hypercall{set\_debugreg(int reg, unsigned long value)}
  2.1052 -
  2.1053 -Set debug register {\tt reg} to {\tt value} 
  2.1054 -
  2.1055 -\hypercall{get\_debugreg(int reg)}
  2.1056 -
  2.1057 -Return the contents of the debug register {\tt reg}
  2.1058 -\end{quote}
  2.1059 -
  2.1060 -And finally: 
  2.1061 -\begin{quote}
  2.1062 -\hypercall{xen\_version(int cmd)}
  2.1063 -
  2.1064 -Request Xen version number.
  2.1065 -\end{quote} 
  2.1066 -
  2.1067 -This is useful to ensure that user-space tools are in sync 
  2.1068 -with the underlying hypervisor. 
  2.1069 -
  2.1070 -\section{Deprecated Hypercalls}
  2.1071 -
  2.1072 -Xen is under constant development and refinement; as such there 
  2.1073 -are plans to improve the way in which various pieces of functionality 
  2.1074 -are exposed to guest OSes. 
  2.1075 -
  2.1076 -\begin{quote} 
  2.1077 -\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
  2.1078 -
  2.1079 -Toggle various memory management modes (in particular wrritable page
  2.1080 -tables and superpage support). 
  2.1081 -
  2.1082 -\end{quote} 
  2.1083 -
  2.1084 -This is likely to be replaced with mode values in the shared 
  2.1085 -information page since this is more resilient for resumption 
  2.1086 -after migration or checkpoint. 
  2.1087 -
  2.1088 -
  2.1089 -
  2.1090 -
  2.1091 -
  2.1092 -
  2.1093 +%% chapter hypercalls moved to hypercalls.tex
  2.1094 +\include{src/interface/hypercalls}
  2.1095  
  2.1096  
  2.1097  %% 
  2.1098 @@ -1173,279 +112,9 @@ after migration or checkpoint.
  2.1099  %% new scheduler... not clear how many of them there are...
  2.1100  %%
  2.1101  
  2.1102 -\begin{comment}
  2.1103 -
  2.1104 -\chapter{Scheduling API}  
  2.1105 -
  2.1106 -The scheduling API is used by both the schedulers described above and should
  2.1107 -also be used by any new schedulers.  It provides a generic interface and also
  2.1108 -implements much of the ``boilerplate'' code.
  2.1109 -
  2.1110 -Schedulers conforming to this API are described by the following
  2.1111 -structure:
  2.1112 -
  2.1113 -\begin{verbatim}
  2.1114 -struct scheduler
  2.1115 -{
  2.1116 -    char *name;             /* full name for this scheduler      */
  2.1117 -    char *opt_name;         /* option name for this scheduler    */
  2.1118 -    unsigned int sched_id;  /* ID for this scheduler             */
  2.1119 -
  2.1120 -    int          (*init_scheduler) ();
  2.1121 -    int          (*alloc_task)     (struct task_struct *);
  2.1122 -    void         (*add_task)       (struct task_struct *);
  2.1123 -    void         (*free_task)      (struct task_struct *);
  2.1124 -    void         (*rem_task)       (struct task_struct *);
  2.1125 -    void         (*wake_up)        (struct task_struct *);
  2.1126 -    void         (*do_block)       (struct task_struct *);
  2.1127 -    task_slice_t (*do_schedule)    (s_time_t);
  2.1128 -    int          (*control)        (struct sched_ctl_cmd *);
  2.1129 -    int          (*adjdom)         (struct task_struct *,
  2.1130 -                                    struct sched_adjdom_cmd *);
  2.1131 -    s32          (*reschedule)     (struct task_struct *);
  2.1132 -    void         (*dump_settings)  (void);
  2.1133 -    void         (*dump_cpu_state) (int);
  2.1134 -    void         (*dump_runq_el)   (struct task_struct *);
  2.1135 -};
  2.1136 -\end{verbatim}
  2.1137 -
  2.1138 -The only method that {\em must} be implemented is
  2.1139 -{\tt do\_schedule()}.  However, if there is not some implementation for the
  2.1140 -{\tt wake\_up()} method then waking tasks will not get put on the runqueue!
  2.1141 -
  2.1142 -The fields of the above structure are described in more detail below.
  2.1143 -
  2.1144 -\subsubsection{name}
  2.1145 -
  2.1146 -The name field should point to a descriptive ASCII string.
  2.1147 -
  2.1148 -\subsubsection{opt\_name}
  2.1149 -
  2.1150 -This field is the value of the {\tt sched=} boot-time option that will select
  2.1151 -this scheduler.
  2.1152 -
  2.1153 -\subsubsection{sched\_id}
  2.1154 -
  2.1155 -This is an integer that uniquely identifies this scheduler.  There should be a
  2.1156 -macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}.
  2.1157 -
  2.1158 -\subsubsection{init\_scheduler}
  2.1159 -
  2.1160 -\paragraph*{Purpose}
  2.1161 -
  2.1162 -This is a function for performing any scheduler-specific initialisation.  For
  2.1163 -instance, it might allocate memory for per-CPU scheduler data and initialise it
  2.1164 -appropriately.
  2.1165 -
  2.1166 -\paragraph*{Call environment}
  2.1167 -
  2.1168 -This function is called after the initialisation performed by the generic
  2.1169 -layer.  The function is called exactly once, for the scheduler that has been
  2.1170 -selected.
  2.1171 -
  2.1172 -\paragraph*{Return values}
  2.1173 -
  2.1174 -This should return negative on failure --- this will cause an
  2.1175 -immediate panic and the system will fail to boot.
  2.1176 -
  2.1177 -\subsubsection{alloc\_task}
  2.1178 -
  2.1179 -\paragraph*{Purpose}
  2.1180 -Called when a {\tt task\_struct} is allocated by the generic scheduler
  2.1181 -layer.  A particular scheduler implementation may use this method to
  2.1182 -allocate per-task data for this task.  It may use the {\tt
  2.1183 -sched\_priv} pointer in the {\tt task\_struct} to point to this data.
  2.1184 -
  2.1185 -\paragraph*{Call environment}
  2.1186 -The generic layer guarantees that the {\tt sched\_priv} field will
  2.1187 -remain intact from the time this method is called until the task is
  2.1188 -deallocated (so long as the scheduler implementation does not change
  2.1189 -it explicitly!).
  2.1190 -
  2.1191 -\paragraph*{Return values}
  2.1192 -Negative on failure.
  2.1193 -
  2.1194 -\subsubsection{add\_task}
  2.1195 -
  2.1196 -\paragraph*{Purpose}
  2.1197 -
  2.1198 -Called when a task is initially added by the generic layer.
  2.1199 -
  2.1200 -\paragraph*{Call environment}
  2.1201 -
  2.1202 -The fields in the {\tt task\_struct} are now filled out and available for use.
  2.1203 -Schedulers should implement appropriate initialisation of any per-task private
  2.1204 -information in this method.
  2.1205 -
  2.1206 -\subsubsection{free\_task}
  2.1207 -
  2.1208 -\paragraph*{Purpose}
  2.1209 -
  2.1210 -Schedulers should free the space used by any associated private data
  2.1211 -structures.
  2.1212 -
  2.1213 -\paragraph*{Call environment}
  2.1214 -
  2.1215 -This is called when a {\tt task\_struct} is about to be deallocated.
  2.1216 -The generic layer will have done generic task removal operations and
  2.1217 -(if implemented) called the scheduler's {\tt rem\_task} method before
  2.1218 -this method is called.
  2.1219 -
  2.1220 -\subsubsection{rem\_task}
  2.1221 -
  2.1222 -\paragraph*{Purpose}
  2.1223 -
  2.1224 -This is called when a task is being removed from scheduling (but is
  2.1225 -not yet being freed).
  2.1226 -
  2.1227 -\subsubsection{wake\_up}
  2.1228 -
  2.1229 -\paragraph*{Purpose}
  2.1230 -
  2.1231 -Called when a task is woken up, this method should put the task on the runqueue
  2.1232 -(or do the scheduler-specific equivalent action).
  2.1233 -
  2.1234 -\paragraph*{Call environment}
  2.1235 -
  2.1236 -The task is already set to state RUNNING.
  2.1237 -
  2.1238 -\subsubsection{do\_block}
  2.1239 -
  2.1240 -\paragraph*{Purpose}
  2.1241 -
  2.1242 -This function is called when a task is blocked.  This function should
  2.1243 -not remove the task from the runqueue.
  2.1244 -
  2.1245 -\paragraph*{Call environment}
  2.1246 -
  2.1247 -The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to
  2.1248 -TASK\_INTERRUPTIBLE on entry to this method.  A call to the {\tt
  2.1249 -  do\_schedule} method will be made after this method returns, in
  2.1250 -order to select the next task to run.
  2.1251 -
  2.1252 -\subsubsection{do\_schedule}
  2.1253 -
  2.1254 -This method must be implemented.
  2.1255 -
  2.1256 -\paragraph*{Purpose}
  2.1257 -
  2.1258 -The method is called each time a new task must be chosen for scheduling on the
  2.1259 -current CPU.  The current time as passed as the single argument (the current
  2.1260 -task can be found using the {\tt current} macro).
  2.1261 -
  2.1262 -This method should select the next task to run on this CPU and set it's minimum
  2.1263 -time to run as well as returning the data described below.
  2.1264 -
  2.1265 -This method should also take the appropriate action if the previous
  2.1266 -task has blocked, e.g. removing it from the runqueue.
  2.1267 -
  2.1268 -\paragraph*{Call environment}
  2.1269 -
  2.1270 -The other fields in the {\tt task\_struct} are updated by the generic layer,
  2.1271 -which also performs all Xen-specific tasks and performs the actual task switch
  2.1272 -(unless the previous task has been chosen again).
  2.1273 -
  2.1274 -This method is called with the {\tt schedule\_lock} held for the current CPU
  2.1275 -and local interrupts disabled.
  2.1276 -
  2.1277 -\paragraph*{Return values}
  2.1278 -
  2.1279 -Must return a {\tt struct task\_slice} describing what task to run and how long
  2.1280 -for (at maximum).
  2.1281 -
  2.1282 -\subsubsection{control}
  2.1283 -
  2.1284 -\paragraph*{Purpose}
  2.1285 -
  2.1286 -This method is called for global scheduler control operations.  It takes a
  2.1287 -pointer to a {\tt struct sched\_ctl\_cmd}, which it should either
  2.1288 -source data from or populate with data, depending on the value of the
  2.1289 -{\tt direction} field.
  2.1290 -
  2.1291 -\paragraph*{Call environment}
  2.1292 -
  2.1293 -The generic layer guarantees that when this method is called, the
  2.1294 -caller selected the correct scheduler ID, hence the scheduler's
  2.1295 -implementation does not need to sanity-check these parts of the call.
  2.1296 -
  2.1297 -\paragraph*{Return values}
  2.1298 -
  2.1299 -This function should return the value to be passed back to user space, hence it
  2.1300 -should either be 0 or an appropriate errno value.
  2.1301 -
  2.1302 -\subsubsection{sched\_adjdom}
  2.1303 -
  2.1304 -\paragraph*{Purpose}
  2.1305 -
  2.1306 -This method is called to adjust the scheduling parameters of a particular
  2.1307 -domain, or to query their current values.  The function should check
  2.1308 -the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in
  2.1309 -order to determine which of these operations is being performed.
  2.1310 -
  2.1311 -\paragraph*{Call environment}
  2.1312 -
  2.1313 -The generic layer guarantees that the caller has specified the correct
  2.1314 -control interface version and scheduler ID and that the supplied {\tt
  2.1315 -task\_struct} will not be deallocated during the call (hence it is not
  2.1316 -necessary to {\tt get\_task\_struct}).
  2.1317 -
  2.1318 -\paragraph*{Return values}
  2.1319 -
  2.1320 -This function should return the value to be passed back to user space, hence it
  2.1321 -should either be 0 or an appropriate errno value.
  2.1322 -
  2.1323 -\subsubsection{reschedule}
  2.1324 -
  2.1325 -\paragraph*{Purpose}
  2.1326 -
  2.1327 -This method is called to determine if a reschedule is required as a result of a
  2.1328 -particular task.
  2.1329 -
  2.1330 -\paragraph*{Call environment}
  2.1331 -The generic layer will cause a reschedule if the current domain is the idle
  2.1332 -task or it has exceeded its minimum time slice before a reschedule.  The
  2.1333 -generic layer guarantees that the task passed is not currently running but is
  2.1334 -on the runqueue.
  2.1335 -
  2.1336 -\paragraph*{Return values}
  2.1337 -
  2.1338 -Should return a mask of CPUs to cause a reschedule on.
  2.1339 -
  2.1340 -\subsubsection{dump\_settings}
  2.1341 -
  2.1342 -\paragraph*{Purpose}
  2.1343 -
  2.1344 -If implemented, this should dump any private global settings for this
  2.1345 -scheduler to the console.
  2.1346 -
  2.1347 -\paragraph*{Call environment}
  2.1348 -
  2.1349 -This function is called with interrupts enabled.
  2.1350 -
  2.1351 -\subsubsection{dump\_cpu\_state}
  2.1352 -
  2.1353 -\paragraph*{Purpose}
  2.1354 -
  2.1355 -This method should dump any private settings for the specified CPU.
  2.1356 -
  2.1357 -\paragraph*{Call environment}
  2.1358 -
  2.1359 -This function is called with interrupts disabled and the {\tt schedule\_lock}
  2.1360 -for the specified CPU held.
  2.1361 -
  2.1362 -\subsubsection{dump\_runq\_el}
  2.1363 -
  2.1364 -\paragraph*{Purpose}
  2.1365 -
  2.1366 -This method should dump any private settings for the specified task.
  2.1367 -
  2.1368 -\paragraph*{Call environment}
  2.1369 -
  2.1370 -This function is called with interrupts disabled and the {\tt schedule\_lock}
  2.1371 -for the task's CPU held.
  2.1372 -
  2.1373 -\end{comment} 
  2.1374 -
  2.1375 +%% \include{src/interface/scheduling}
  2.1376 +%% scheduling information moved to scheduling.tex
  2.1377 +%% still commented out
  2.1378  
  2.1379  
  2.1380  
  2.1381 @@ -1457,74 +126,9 @@ for the task's CPU held.
  2.1382  %% (and/or kip's stuff?) and write about that instead? 
  2.1383  %%
  2.1384  
  2.1385 -\begin{comment} 
  2.1386 -
  2.1387 -\chapter{Debugging}
  2.1388 -
  2.1389 -Xen provides tools for debugging both Xen and guest OSes.  Currently, the
  2.1390 -Pervasive Debugger provides a GDB stub, which provides facilities for symbolic
  2.1391 -debugging of Xen itself and of OS kernels running on top of Xen.  The Trace
  2.1392 -Buffer provides a lightweight means to log data about Xen's internal state and
  2.1393 -behaviour at runtime, for later analysis.
  2.1394 -
  2.1395 -\section{Pervasive Debugger}
  2.1396 -
  2.1397 -Information on using the pervasive debugger is available in pdb.txt.
  2.1398 -
  2.1399 -
  2.1400 -\section{Trace Buffer}
  2.1401 -
  2.1402 -The trace buffer provides a means to observe Xen's operation from domain 0.
  2.1403 -Trace events, inserted at key points in Xen's code, record data that can be
  2.1404 -read by the {\tt xentrace} tool.  Recording these events has a low overhead
  2.1405 -and hence the trace buffer may be useful for debugging timing-sensitive
  2.1406 -behaviours.
  2.1407 -
  2.1408 -\subsection{Internal API}
  2.1409 -
  2.1410 -To use the trace buffer functionality from within Xen, you must {\tt \#include
  2.1411 -<xen/trace.h>}, which contains definitions related to the trace buffer.  Trace
  2.1412 -events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1,
  2.1413 -2, 3, 4 or 5) macros.  These all take an event number, plus {\tt x} additional
  2.1414 -(32-bit) data as their arguments.  For trace buffer-enabled builds of Xen these
  2.1415 -will insert the event ID and data into the trace buffer, along with the current
  2.1416 -value of the CPU cycle-counter.  For builds without the trace buffer enabled,
  2.1417 -the macros expand to no-ops and thus can be left in place without incurring
  2.1418 -overheads.
  2.1419 -
  2.1420 -\subsection{Trace-enabled builds}
  2.1421 -
  2.1422 -By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG}
  2.1423 -is not defined).  It can be enabled separately by defining {\tt TRACE\_BUFFER},
  2.1424 -either in {\tt <xen/config.h>} or on the gcc command line.
  2.1425 -
  2.1426 -The size (in pages) of the per-CPU trace buffers can be specified using the
  2.1427 -{\tt tbuf\_size=n } boot parameter to Xen.  If the size is set to 0, the trace
  2.1428 -buffers will be disabled.
  2.1429 -
  2.1430 -\subsection{Dumping trace data}
  2.1431 -
  2.1432 -When running a trace buffer build of Xen, trace data are written continuously
  2.1433 -into the buffer data areas, with newer data overwriting older data.  This data
  2.1434 -can be captured using the {\tt xentrace} program in domain 0.
  2.1435 -
  2.1436 -The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace
  2.1437 -buffers into its address space.  It then periodically polls all the buffers for
  2.1438 -new data, dumping out any new records from each buffer in turn.  As a result,
  2.1439 -for machines with multiple (logical) CPUs, the trace buffer output will not be
  2.1440 -in overall chronological order.
  2.1441 -
  2.1442 -The output from {\tt xentrace} can be post-processed using {\tt
  2.1443 -xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and
  2.1444 -{\tt xentrace\_format} (used to pretty-print trace data).  For the predefined
  2.1445 -trace points, there is an example format file in {\tt tools/xentrace/formats }.
  2.1446 -
  2.1447 -For more information, see the manual pages for {\tt xentrace}, {\tt
  2.1448 -xentrace\_format} and {\tt xentrace\_cpusplit}.
  2.1449 -
  2.1450 -\end{comment} 
  2.1451 -
  2.1452 -
  2.1453 +%% \include{src/interface/debugging}
  2.1454 +%% debugging information moved to debugging.tex
  2.1455 +%% still commented out
  2.1456  
  2.1457  
  2.1458  \end{document}
     3.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     3.2 +++ b/docs/src/interface/architecture.tex	Tue Sep 20 09:17:33 2005 +0000
     3.3 @@ -0,0 +1,140 @@
     3.4 +\chapter{Virtual Architecture}
     3.5 +
     3.6 +On a Xen-based system, the hypervisor itself runs in {\it ring 0}.  It
     3.7 +has full access to the physical memory available in the system and is
     3.8 +responsible for allocating portions of it to the domains.  Guest
     3.9 +operating systems run in and use {\it rings 1}, {\it 2} and {\it 3} as
    3.10 +they see fit. Segmentation is used to prevent the guest OS from
    3.11 +accessing the portion of the address space that is reserved for Xen.
    3.12 +We expect most guest operating systems will use ring 1 for their own
    3.13 +operation and place applications in ring 3.
    3.14 +
    3.15 +In this chapter we consider the basic virtual architecture provided by
    3.16 +Xen: the basic CPU state, exception and interrupt handling, and time.
    3.17 +Other aspects such as memory and device access are discussed in later
    3.18 +chapters.
    3.19 +
    3.20 +
    3.21 +\section{CPU state}
    3.22 +
    3.23 +All privileged state must be handled by Xen.  The guest OS has no
    3.24 +direct access to CR3 and is not permitted to update privileged bits in
    3.25 +EFLAGS. Guest OSes use \emph{hypercalls} to invoke operations in Xen;
    3.26 +these are analogous to system calls but occur from ring 1 to ring 0.
    3.27 +
    3.28 +A list of all hypercalls is given in Appendix~\ref{a:hypercalls}.
    3.29 +
    3.30 +
    3.31 +\section{Exceptions}
    3.32 +
    3.33 +A virtual IDT is provided --- a domain can submit a table of trap
    3.34 +handlers to Xen via the {\tt set\_trap\_table()} hypercall.  Most trap
    3.35 +handlers are identical to native x86 handlers, although the page-fault
    3.36 +handler is somewhat different.
    3.37 +
    3.38 +
    3.39 +\section{Interrupts and events}
    3.40 +
    3.41 +Interrupts are virtualized by mapping them to \emph{events}, which are
    3.42 +delivered asynchronously to the target domain using a callback
    3.43 +supplied via the {\tt set\_callbacks()} hypercall.  A guest OS can map
    3.44 +these events onto its standard interrupt dispatch mechanisms.  Xen is
    3.45 +responsible for determining the target domain that will handle each
    3.46 +physical interrupt source. For more details on the binding of event
    3.47 +sources to events, see Chapter~\ref{c:devices}.
    3.48 +
    3.49 +
    3.50 +\section{Time}
    3.51 +
    3.52 +Guest operating systems need to be aware of the passage of both real
    3.53 +(or wallclock) time and their own `virtual time' (the time for which
    3.54 +they have been executing). Furthermore, Xen has a notion of time which
    3.55 +is used for scheduling. The following notions of time are provided:
    3.56 +
    3.57 +\begin{description}
    3.58 +\item[Cycle counter time.]
    3.59 +
    3.60 +  This provides a fine-grained time reference.  The cycle counter time
    3.61 +  is used to accurately extrapolate the other time references.  On SMP
    3.62 +  machines it is currently assumed that the cycle counter time is
    3.63 +  synchronized between CPUs.  The current x86-based implementation
    3.64 +  achieves this within inter-CPU communication latencies.
    3.65 +
    3.66 +\item[System time.]
    3.67 +
    3.68 +  This is a 64-bit counter which holds the number of nanoseconds that
    3.69 +  have elapsed since system boot.
    3.70 +
    3.71 +\item[Wall clock time.]
    3.72 +
    3.73 +  This is the time of day in a Unix-style {\tt struct timeval}
    3.74 +  (seconds and microseconds since 1 January 1970, adjusted by leap
    3.75 +  seconds).  An NTP client hosted by {\it domain 0} can keep this
    3.76 +  value accurate.
    3.77 +
    3.78 +\item[Domain virtual time.]
    3.79 +
    3.80 +  This progresses at the same pace as system time, but only while a
    3.81 +  domain is executing --- it stops while a domain is de-scheduled.
    3.82 +  Therefore the share of the CPU that a domain receives is indicated
    3.83 +  by the rate at which its virtual time increases.
    3.84 +
    3.85 +\end{description}
    3.86 +
    3.87 +
    3.88 +Xen exports timestamps for system time and wall-clock time to guest
    3.89 +operating systems through a shared page of memory.  Xen also provides
    3.90 +the cycle counter time at the instant the timestamps were calculated,
    3.91 +and the CPU frequency in Hertz.  This allows the guest to extrapolate
    3.92 +system and wall-clock times accurately based on the current cycle
    3.93 +counter time.
    3.94 +
    3.95 +Since all time stamps need to be updated and read \emph{atomically}
    3.96 +two version numbers are also stored in the shared info page. The first
    3.97 +is incremented prior to an update, while the second is only
    3.98 +incremented afterwards. Thus a guest can be sure that it read a
    3.99 +consistent state by checking the two version numbers are equal.
   3.100 +
   3.101 +Xen includes a periodic ticker which sends a timer event to the
   3.102 +currently executing domain every 10ms.  The Xen scheduler also sends a
   3.103 +timer event whenever a domain is scheduled; this allows the guest OS
   3.104 +to adjust for the time that has passed while it has been inactive.  In
   3.105 +addition, Xen allows each domain to request that they receive a timer
   3.106 +event sent at a specified system time by using the {\tt
   3.107 +  set\_timer\_op()} hypercall.  Guest OSes may use this timer to
   3.108 +implement timeout values when they block.
   3.109 +
   3.110 +
   3.111 +
   3.112 +%% % akw: demoting this to a section -- not sure if there is any point
   3.113 +%% % though, maybe just remove it.
   3.114 +
   3.115 +\section{Xen CPU Scheduling}
   3.116 +
   3.117 +Xen offers a uniform API for CPU schedulers.  It is possible to choose
   3.118 +from a number of schedulers at boot and it should be easy to add more.
   3.119 +The BVT, Atropos and Round Robin schedulers are part of the normal Xen
   3.120 +distribution.  BVT provides proportional fair shares of the CPU to the
   3.121 +running domains.  Atropos can be used to reserve absolute shares of
   3.122 +the CPU for each domain.  Round-robin is provided as an example of
   3.123 +Xen's internal scheduler API.
   3.124 +
   3.125 +\paragraph*{Note: SMP host support}
   3.126 +Xen has always supported SMP host systems.  Domains are statically
   3.127 +assigned to CPUs, either at creation time or when manually pinning to
   3.128 +a particular CPU.  The current schedulers then run locally on each CPU
   3.129 +to decide which of the assigned domains should be run there. The
   3.130 +user-level control software can be used to perform coarse-grain
   3.131 +load-balancing between CPUs.
   3.132 +
   3.133 +
   3.134 +%% More information on the characteristics and use of these schedulers
   3.135 +%% is available in {\tt Sched-HOWTO.txt}.
   3.136 +
   3.137 +
   3.138 +\section{Privileged operations}
   3.139 +
   3.140 +Xen exports an extended interface to privileged domains (viz.\ {\it
   3.141 +  Domain 0}). This allows such domains to build and boot other domains
   3.142 +on the server, and provides control interfaces for managing
   3.143 +scheduling, memory, networking, and block devices.
     4.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     4.2 +++ b/docs/src/interface/debugging.tex	Tue Sep 20 09:17:33 2005 +0000
     4.3 @@ -0,0 +1,62 @@
     4.4 +\chapter{Debugging}
     4.5 +
     4.6 +Xen provides tools for debugging both Xen and guest OSes.  Currently, the
     4.7 +Pervasive Debugger provides a GDB stub, which provides facilities for symbolic
     4.8 +debugging of Xen itself and of OS kernels running on top of Xen.  The Trace
     4.9 +Buffer provides a lightweight means to log data about Xen's internal state and
    4.10 +behaviour at runtime, for later analysis.
    4.11 +
    4.12 +\section{Pervasive Debugger}
    4.13 +
    4.14 +Information on using the pervasive debugger is available in pdb.txt.
    4.15 +
    4.16 +
    4.17 +\section{Trace Buffer}
    4.18 +
    4.19 +The trace buffer provides a means to observe Xen's operation from domain 0.
    4.20 +Trace events, inserted at key points in Xen's code, record data that can be
    4.21 +read by the {\tt xentrace} tool.  Recording these events has a low overhead
    4.22 +and hence the trace buffer may be useful for debugging timing-sensitive
    4.23 +behaviours.
    4.24 +
    4.25 +\subsection{Internal API}
    4.26 +
    4.27 +To use the trace buffer functionality from within Xen, you must {\tt \#include
    4.28 +<xen/trace.h>}, which contains definitions related to the trace buffer.  Trace
    4.29 +events are inserted into the buffer using the {\tt TRACE\_xD} ({\tt x} = 0, 1,
    4.30 +2, 3, 4 or 5) macros.  These all take an event number, plus {\tt x} additional
    4.31 +(32-bit) data as their arguments.  For trace buffer-enabled builds of Xen these
    4.32 +will insert the event ID and data into the trace buffer, along with the current
    4.33 +value of the CPU cycle-counter.  For builds without the trace buffer enabled,
    4.34 +the macros expand to no-ops and thus can be left in place without incurring
    4.35 +overheads.
    4.36 +
    4.37 +\subsection{Trace-enabled builds}
    4.38 +
    4.39 +By default, the trace buffer is enabled only in debug builds (i.e. {\tt NDEBUG}
    4.40 +is not defined).  It can be enabled separately by defining {\tt TRACE\_BUFFER},
    4.41 +either in {\tt <xen/config.h>} or on the gcc command line.
    4.42 +
    4.43 +The size (in pages) of the per-CPU trace buffers can be specified using the
    4.44 +{\tt tbuf\_size=n } boot parameter to Xen.  If the size is set to 0, the trace
    4.45 +buffers will be disabled.
    4.46 +
    4.47 +\subsection{Dumping trace data}
    4.48 +
    4.49 +When running a trace buffer build of Xen, trace data are written continuously
    4.50 +into the buffer data areas, with newer data overwriting older data.  This data
    4.51 +can be captured using the {\tt xentrace} program in domain 0.
    4.52 +
    4.53 +The {\tt xentrace} tool uses {\tt /dev/mem} in domain 0 to map the trace
    4.54 +buffers into its address space.  It then periodically polls all the buffers for
    4.55 +new data, dumping out any new records from each buffer in turn.  As a result,
    4.56 +for machines with multiple (logical) CPUs, the trace buffer output will not be
    4.57 +in overall chronological order.
    4.58 +
    4.59 +The output from {\tt xentrace} can be post-processed using {\tt
    4.60 +xentrace\_cpusplit} (used to split trace data out into per-cpu log files) and
    4.61 +{\tt xentrace\_format} (used to pretty-print trace data).  For the predefined
    4.62 +trace points, there is an example format file in {\tt tools/xentrace/formats }.
    4.63 +
    4.64 +For more information, see the manual pages for {\tt xentrace}, {\tt
    4.65 +xentrace\_format} and {\tt xentrace\_cpusplit}.
     5.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     5.2 +++ b/docs/src/interface/devices.tex	Tue Sep 20 09:17:33 2005 +0000
     5.3 @@ -0,0 +1,178 @@
     5.4 +\chapter{Devices}
     5.5 +\label{c:devices}
     5.6 +
     5.7 +Devices such as network and disk are exported to guests using a split
     5.8 +device driver.  The device driver domain, which accesses the physical
     5.9 +device directly also runs a \emph{backend} driver, serving requests to
    5.10 +that device from guests.  Each guest will use a simple \emph{frontend}
    5.11 +driver, to access the backend.  Communication between these domains is
    5.12 +composed of two parts: First, data is placed onto a shared memory page
    5.13 +between the domains.  Second, an event channel between the two domains
    5.14 +is used to pass notification that data is outstanding.  This
    5.15 +separation of notification from data transfer allows message batching,
    5.16 +and results in very efficient device access.
    5.17 +
    5.18 +Event channels are used extensively in device virtualization; each
    5.19 +domain has a number of end-points or \emph{ports} each of which may be
    5.20 +bound to one of the following \emph{event sources}:
    5.21 +\begin{itemize}
    5.22 +  \item a physical interrupt from a real device, 
    5.23 +  \item a virtual interrupt (callback) from Xen, or 
    5.24 +  \item a signal from another domain 
    5.25 +\end{itemize}
    5.26 +
    5.27 +Events are lightweight and do not carry much information beyond the
    5.28 +source of the notification. Hence when performing bulk data transfer,
    5.29 +events are typically used as synchronization primitives over a shared
    5.30 +memory transport. Event channels are managed via the {\tt
    5.31 +  event\_channel\_op()} hypercall; for more details see
    5.32 +Section~\ref{s:idc}.
    5.33 +
    5.34 +This chapter focuses on some individual device interfaces available to
    5.35 +Xen guests.
    5.36 +
    5.37 +
    5.38 +\section{Network I/O}
    5.39 +
    5.40 +Virtual network device services are provided by shared memory
    5.41 +communication with a backend domain.  From the point of view of other
    5.42 +domains, the backend may be viewed as a virtual ethernet switch
    5.43 +element with each domain having one or more virtual network interfaces
    5.44 +connected to it.
    5.45 +
    5.46 +\subsection{Backend Packet Handling}
    5.47 +
    5.48 +The backend driver is responsible for a variety of actions relating to
    5.49 +the transmission and reception of packets from the physical device.
    5.50 +With regard to transmission, the backend performs these key actions:
    5.51 +
    5.52 +\begin{itemize}
    5.53 +\item {\bf Validation:} To ensure that domains do not attempt to
    5.54 +  generate invalid (e.g. spoofed) traffic, the backend driver may
    5.55 +  validate headers ensuring that source MAC and IP addresses match the
    5.56 +  interface that they have been sent from.
    5.57 +
    5.58 +  Validation functions can be configured using standard firewall rules
    5.59 +  ({\small{\tt iptables}} in the case of Linux).
    5.60 +  
    5.61 +\item {\bf Scheduling:} Since a number of domains can share a single
    5.62 +  physical network interface, the backend must mediate access when
    5.63 +  several domains each have packets queued for transmission.  This
    5.64 +  general scheduling function subsumes basic shaping or rate-limiting
    5.65 +  schemes.
    5.66 +  
    5.67 +\item {\bf Logging and Accounting:} The backend domain can be
    5.68 +  configured with classifier rules that control how packets are
    5.69 +  accounted or logged.  For example, log messages might be generated
    5.70 +  whenever a domain attempts to send a TCP packet containing a SYN.
    5.71 +\end{itemize}
    5.72 +
    5.73 +On receipt of incoming packets, the backend acts as a simple
    5.74 +demultiplexer: Packets are passed to the appropriate virtual interface
    5.75 +after any necessary logging and accounting have been carried out.
    5.76 +
    5.77 +\subsection{Data Transfer}
    5.78 +
    5.79 +Each virtual interface uses two ``descriptor rings'', one for
    5.80 +transmit, the other for receive.  Each descriptor identifies a block
    5.81 +of contiguous physical memory allocated to the domain.
    5.82 +
    5.83 +The transmit ring carries packets to transmit from the guest to the
    5.84 +backend domain.  The return path of the transmit ring carries messages
    5.85 +indicating that the contents have been physically transmitted and the
    5.86 +backend no longer requires the associated pages of memory.
    5.87 +
    5.88 +To receive packets, the guest places descriptors of unused pages on
    5.89 +the receive ring.  The backend will return received packets by
    5.90 +exchanging these pages in the domain's memory with new pages
    5.91 +containing the received data, and passing back descriptors regarding
    5.92 +the new packets on the ring.  This zero-copy approach allows the
    5.93 +backend to maintain a pool of free pages to receive packets into, and
    5.94 +then deliver them to appropriate domains after examining their
    5.95 +headers.
    5.96 +
    5.97 +% Real physical addresses are used throughout, with the domain
    5.98 +% performing translation from pseudo-physical addresses if that is
    5.99 +% necessary.
   5.100 +
   5.101 +If a domain does not keep its receive ring stocked with empty buffers
   5.102 +then packets destined to it may be dropped.  This provides some
   5.103 +defence against receive livelock problems because an overload domain
   5.104 +will cease to receive further data.  Similarly, on the transmit path,
   5.105 +it provides the application with feedback on the rate at which packets
   5.106 +are able to leave the system.
   5.107 +
   5.108 +Flow control on rings is achieved by including a pair of producer
   5.109 +indexes on the shared ring page.  Each side will maintain a private
   5.110 +consumer index indicating the next outstanding message.  In this
   5.111 +manner, the domains cooperate to divide the ring into two message
   5.112 +lists, one in each direction.  Notification is decoupled from the
   5.113 +immediate placement of new messages on the ring; the event channel
   5.114 +will be used to generate notification when {\em either} a certain
   5.115 +number of outstanding messages are queued, {\em or} a specified number
   5.116 +of nanoseconds have elapsed since the oldest message was placed on the
   5.117 +ring.
   5.118 +
   5.119 +%% Not sure if my version is any better -- here is what was here
   5.120 +%% before: Synchronization between the backend domain and the guest is
   5.121 +%% achieved using counters held in shared memory that is accessible to
   5.122 +%% both.  Each ring has associated producer and consumer indices
   5.123 +%% indicating the area in the ring that holds descriptors that contain
   5.124 +%% data.  After receiving {\it n} packets or {\t nanoseconds} after
   5.125 +%% receiving the first packet, the hypervisor sends an event to the
   5.126 +%% domain.
   5.127 +
   5.128 +
   5.129 +\section{Block I/O}
   5.130 +
   5.131 +All guest OS disk access goes through the virtual block device VBD
   5.132 +interface.  This interface allows domains access to portions of block
   5.133 +storage devices visible to the the block backend device.  The VBD
   5.134 +interface is a split driver, similar to the network interface
   5.135 +described above.  A single shared memory ring is used between the
   5.136 +frontend and backend drivers, across which read and write messages are
   5.137 +sent.
   5.138 +
   5.139 +Any block device accessible to the backend domain, including
   5.140 +network-based block (iSCSI, *NBD, etc), loopback and LVM/MD devices,
   5.141 +can be exported as a VBD.  Each VBD is mapped to a device node in the
   5.142 +guest, specified in the guest's startup configuration.
   5.143 +
   5.144 +Old (Xen 1.2) virtual disks are not supported under Xen 2.0, since
   5.145 +similar functionality can be achieved using the more complete LVM
   5.146 +system, which is already in widespread use.
   5.147 +
   5.148 +\subsection{Data Transfer}
   5.149 +
   5.150 +The single ring between the guest and the block backend supports three
   5.151 +messages:
   5.152 +
   5.153 +\begin{description}
   5.154 +\item [{\small {\tt PROBE}}:] Return a list of the VBDs available to
   5.155 +  this guest from the backend.  The request includes a descriptor of a
   5.156 +  free page into which the reply will be written by the backend.
   5.157 +
   5.158 +\item [{\small {\tt READ}}:] Read data from the specified block
   5.159 +  device.  The front end identifies the device and location to read
   5.160 +  from and attaches pages for the data to be copied to (typically via
   5.161 +  DMA from the device).  The backend acknowledges completed read
   5.162 +  requests as they finish.
   5.163 +
   5.164 +\item [{\small {\tt WRITE}}:] Write data to the specified block
   5.165 +  device.  This functions essentially as {\small {\tt READ}}, except
   5.166 +  that the data moves to the device instead of from it.
   5.167 +\end{description}
   5.168 +
   5.169 +%% um... some old text: In overview, the same style of descriptor-ring
   5.170 +%% that is used for network packets is used here.  Each domain has one
   5.171 +%% ring that carries operation requests to the hypervisor and carries
   5.172 +%% the results back again.
   5.173 +
   5.174 +%% Rather than copying data, the backend simply maps the domain's
   5.175 +%% buffers in order to enable direct DMA to them.  The act of mapping
   5.176 +%% the buffers also increases the reference counts of the underlying
   5.177 +%% pages, so that the unprivileged domain cannot try to return them to
   5.178 +%% the hypervisor, install them as page tables, or any other unsafe
   5.179 +%% behaviour.
   5.180 +%%
   5.181 +%% % block API here
     6.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     6.2 +++ b/docs/src/interface/further_info.tex	Tue Sep 20 09:17:33 2005 +0000
     6.3 @@ -0,0 +1,49 @@
     6.4 +\chapter{Further Information}
     6.5 +
     6.6 +If you have questions that are not answered by this manual, the
     6.7 +sources of information listed below may be of interest to you.  Note
     6.8 +that bug reports, suggestions and contributions related to the
     6.9 +software (or the documentation) should be sent to the Xen developers'
    6.10 +mailing list (address below).
    6.11 +
    6.12 +
    6.13 +\section{Other documentation}
    6.14 +
    6.15 +If you are mainly interested in using (rather than developing for)
    6.16 +Xen, the \emph{Xen Users' Manual} is distributed in the {\tt docs/}
    6.17 +directory of the Xen source distribution.
    6.18 +
    6.19 +% Various HOWTOs are also available in {\tt docs/HOWTOS}.
    6.20 +
    6.21 +
    6.22 +\section{Online references}
    6.23 +
    6.24 +The official Xen web site is found at:
    6.25 +\begin{quote}
    6.26 +{\tt http://www.cl.cam.ac.uk/Research/SRG/netos/xen/}
    6.27 +\end{quote}
    6.28 +
    6.29 +This contains links to the latest versions of all on-line
    6.30 +documentation.
    6.31 +
    6.32 +
    6.33 +\section{Mailing lists}
    6.34 +
    6.35 +There are currently four official Xen mailing lists:
    6.36 +
    6.37 +\begin{description}
    6.38 +\item[xen-devel@lists.xensource.com] Used for development
    6.39 +  discussions and bug reports.  Subscribe at: \\
    6.40 +  {\small {\tt http://lists.xensource.com/xen-devel}}
    6.41 +\item[xen-users@lists.xensource.com] Used for installation and usage
    6.42 +  discussions and requests for help.  Subscribe at: \\
    6.43 +  {\small {\tt http://lists.xensource.com/xen-users}}
    6.44 +\item[xen-announce@lists.xensource.com] Used for announcements only.
    6.45 +  Subscribe at: \\
    6.46 +  {\small {\tt http://lists.xensource.com/xen-announce}}
    6.47 +\item[xen-changelog@lists.xensource.com] Changelog feed
    6.48 +  from the unstable and 2.0 trees - developer oriented.  Subscribe at: \\
    6.49 +  {\small {\tt http://lists.xensource.com/xen-changelog}}
    6.50 +\end{description}
    6.51 +
    6.52 +Of these, xen-devel is the most active.
     7.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     7.2 +++ b/docs/src/interface/hypercalls.tex	Tue Sep 20 09:17:33 2005 +0000
     7.3 @@ -0,0 +1,524 @@
     7.4 +
     7.5 +\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
     7.6 +
     7.7 +\chapter{Xen Hypercalls}
     7.8 +\label{a:hypercalls}
     7.9 +
    7.10 +Hypercalls represent the procedural interface to Xen; this appendix 
    7.11 +categorizes and describes the current set of hypercalls. 
    7.12 +
    7.13 +\section{Invoking Hypercalls} 
    7.14 +
    7.15 +Hypercalls are invoked in a manner analogous to system calls in a
    7.16 +conventional operating system; a software interrupt is issued which
    7.17 +vectors to an entry point within Xen. On x86\_32 machines the
    7.18 +instruction required is {\tt int \$82}; the (real) IDT is setup so
    7.19 +that this may only be issued from within ring 1. The particular 
    7.20 +hypercall to be invoked is contained in {\tt EAX} --- a list 
    7.21 +mapping these values to symbolic hypercall names can be found 
    7.22 +in {\tt xen/include/public/xen.h}. 
    7.23 +
    7.24 +On some occasions a set of hypercalls will be required to carry
    7.25 +out a higher-level function; a good example is when a guest 
    7.26 +operating wishes to context switch to a new process which 
    7.27 +requires updating various privileged CPU state. As an optimization
    7.28 +for these cases, there is a generic mechanism to issue a set of 
    7.29 +hypercalls as a batch: 
    7.30 +
    7.31 +\begin{quote}
    7.32 +\hypercall{multicall(void *call\_list, int nr\_calls)}
    7.33 +
    7.34 +Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
    7.35 +the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
    7.36 +call\_list}. Each entry contains the hypercall operation code followed
    7.37 +by up to 7 word-sized arguments.
    7.38 +\end{quote}
    7.39 +
    7.40 +Note that multicalls are provided purely as an optimization; there is
    7.41 +no requirement to use them when first porting a guest operating
    7.42 +system.
    7.43 +
    7.44 +
    7.45 +\section{Virtual CPU Setup} 
    7.46 +
    7.47 +At start of day, a guest operating system needs to setup the virtual
    7.48 +CPU it is executing on. This includes installing vectors for the
    7.49 +virtual IDT so that the guest OS can handle interrupts, page faults,
    7.50 +etc. However the very first thing a guest OS must setup is a pair 
    7.51 +of hypervisor callbacks: these are the entry points which Xen will
    7.52 +use when it wishes to notify the guest OS of an occurrence. 
    7.53 +
    7.54 +\begin{quote}
    7.55 +\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
    7.56 +  event\_address, unsigned long failsafe\_selector, unsigned long
    7.57 +  failsafe\_address) }
    7.58 +
    7.59 +Register the normal (``event'') and failsafe callbacks for 
    7.60 +event processing. In each case the code segment selector and 
    7.61 +address within that segment are provided. The selectors must
    7.62 +have RPL 1; in XenLinux we simply use the kernel's CS for both 
    7.63 +{\tt event\_selector} and {\tt failsafe\_selector}.
    7.64 +
    7.65 +The value {\tt event\_address} specifies the address of the guest OSes
    7.66 +event handling and dispatch routine; the {\tt failsafe\_address}
    7.67 +specifies a separate entry point which is used only if a fault occurs
    7.68 +when Xen attempts to use the normal callback. 
    7.69 +\end{quote} 
    7.70 +
    7.71 +
    7.72 +After installing the hypervisor callbacks, the guest OS can 
    7.73 +install a `virtual IDT' by using the following hypercall: 
    7.74 +
    7.75 +\begin{quote} 
    7.76 +\hypercall{set\_trap\_table(trap\_info\_t *table)} 
    7.77 +
    7.78 +Install one or more entries into the per-domain 
    7.79 +trap handler table (essentially a software version of the IDT). 
    7.80 +Each entry in the array pointed to by {\tt table} includes the 
    7.81 +exception vector number with the corresponding segment selector 
    7.82 +and entry point. Most guest OSes can use the same handlers on 
    7.83 +Xen as when running on the real hardware; an exception is the 
    7.84 +page fault handler (exception vector 14) where a modified 
    7.85 +stack-frame layout is used. 
    7.86 +
    7.87 +
    7.88 +\end{quote} 
    7.89 +
    7.90 +
    7.91 +
    7.92 +\section{Scheduling and Timer}
    7.93 +
    7.94 +Domains are preemptively scheduled by Xen according to the 
    7.95 +parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 
    7.96 +In addition, however, a domain may choose to explicitly 
    7.97 +control certain behavior with the following hypercall: 
    7.98 +
    7.99 +\begin{quote} 
   7.100 +\hypercall{sched\_op(unsigned long op)} 
   7.101 +
   7.102 +Request scheduling operation from hypervisor. The options are: {\it
   7.103 +yield}, {\it block}, and {\it shutdown}.  {\it yield} keeps the
   7.104 +calling domain runnable but may cause a reschedule if other domains
   7.105 +are runnable.  {\it block} removes the calling domain from the run
   7.106 +queue and cause is to sleeps until an event is delivered to it.  {\it
   7.107 +shutdown} is used to end the domain's execution; the caller can
   7.108 +additionally specify whether the domain should reboot, halt or
   7.109 +suspend.
   7.110 +\end{quote} 
   7.111 +
   7.112 +To aid the implementation of a process scheduler within a guest OS,
   7.113 +Xen provides a virtual programmable timer:
   7.114 +
   7.115 +\begin{quote}
   7.116 +\hypercall{set\_timer\_op(uint64\_t timeout)} 
   7.117 +
   7.118 +Request a timer event to be sent at the specified system time (time 
   7.119 +in nanoseconds since system boot). The hypercall actually passes the 
   7.120 +64-bit timeout value as a pair of 32-bit values. 
   7.121 +
   7.122 +\end{quote} 
   7.123 +
   7.124 +Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 
   7.125 +allows block-with-timeout semantics. 
   7.126 +
   7.127 +
   7.128 +\section{Page Table Management} 
   7.129 +
   7.130 +Since guest operating systems have read-only access to their page 
   7.131 +tables, Xen must be involved when making any changes. The following
   7.132 +multi-purpose hypercall can be used to modify page-table entries, 
   7.133 +update the machine-to-physical mapping table, flush the TLB, install 
   7.134 +a new page-table base pointer, and more.
   7.135 +
   7.136 +\begin{quote} 
   7.137 +\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 
   7.138 +
   7.139 +Update the page table for the domain; a set of {\tt count} updates are
   7.140 +submitted for processing in a batch, with {\tt success\_count} being 
   7.141 +updated to report the number of successful updates.  
   7.142 +
   7.143 +Each element of {\tt req[]} contains a pointer (address) and value; 
   7.144 +the least significant 2-bits of the pointer are used to distinguish 
   7.145 +the type of update requested as follows:
   7.146 +\begin{description} 
   7.147 +
   7.148 +\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
   7.149 +page table entry to the associated value; Xen will check that the
   7.150 +update is safe, as described in Chapter~\ref{c:memory}.
   7.151 +
   7.152 +\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
   7.153 +  machine-to-physical table. The calling domain must own the machine
   7.154 +  page in question (or be privileged).
   7.155 +
   7.156 +\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
   7.157 +The set of additional MMU operations is considerable, and includes
   7.158 +updating {\tt cr3} (or just re-installing it for a TLB flush),
   7.159 +flushing the cache, installing a new LDT, or pinning \& unpinning
   7.160 +page-table pages (to ensure their reference count doesn't drop to zero
   7.161 +which would require a revalidation of all entries).
   7.162 +
   7.163 +Further extended commands are used to deal with granting and 
   7.164 +acquiring page ownership; see Section~\ref{s:idc}. 
   7.165 +
   7.166 +
   7.167 +\end{description}
   7.168 +
   7.169 +More details on the precise format of all commands can be 
   7.170 +found in {\tt xen/include/public/xen.h}. 
   7.171 +
   7.172 +
   7.173 +\end{quote}
   7.174 +
   7.175 +Explicitly updating batches of page table entries is extremely
   7.176 +efficient, but can require a number of alterations to the guest
   7.177 +OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
   7.178 +recommended for new OS ports.
   7.179 +
   7.180 +Regardless of which page table update mode is being used, however,
   7.181 +there are some occasions (notably handling a demand page fault) where
   7.182 +a guest OS will wish to modify exactly one PTE rather than a
   7.183 +batch. This is catered for by the following:
   7.184 +
   7.185 +\begin{quote} 
   7.186 +\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
   7.187 +val, \\ unsigned long flags)}
   7.188 +
   7.189 +Update the currently installed PTE for the page {\tt page\_nr} to 
   7.190 +{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 
   7.191 +is safe before applying it. The {\tt flags} determine which kind
   7.192 +of TLB flush, if any, should follow the update. 
   7.193 +
   7.194 +\end{quote} 
   7.195 +
   7.196 +Finally, sufficiently privileged domains may occasionally wish to manipulate 
   7.197 +the pages of others: 
   7.198 +\begin{quote}
   7.199 +
   7.200 +\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
   7.201 +unsigned long val, unsigned long flags, uint16\_t domid)}
   7.202 +
   7.203 +Identical to {\tt update\_va\_mapping()} save that the pages being
   7.204 +mapped must belong to the domain {\tt domid}. 
   7.205 +
   7.206 +\end{quote}
   7.207 +
   7.208 +This privileged operation is currently used by backend virtual device
   7.209 +drivers to safely map pages containing I/O data. 
   7.210 +
   7.211 +
   7.212 +
   7.213 +\section{Segmentation Support}
   7.214 +
   7.215 +Xen allows guest OSes to install a custom GDT if they require it; 
   7.216 +this is context switched transparently whenever a domain is 
   7.217 +[de]scheduled.  The following hypercall is effectively a 
   7.218 +`safe' version of {\tt lgdt}: 
   7.219 +
   7.220 +\begin{quote}
   7.221 +\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 
   7.222 +
   7.223 +Install a global descriptor table for a domain; {\tt frame\_list} is
   7.224 +an array of up to 16 machine page frames within which the GDT resides,
   7.225 +with {\tt entries} being the actual number of descriptor-entry
   7.226 +slots. All page frames must be mapped read-only within the guest's
   7.227 +address space, and the table must be large enough to contain Xen's
   7.228 +reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
   7.229 +
   7.230 +\end{quote}
   7.231 +
   7.232 +Many guest OSes will also wish to install LDTs; this is achieved by
   7.233 +using {\tt mmu\_update()} with an extended command, passing the
   7.234 +linear address of the LDT base along with the number of entries. No
   7.235 +special safety checks are required; Xen needs to perform this task
   7.236 +simply since {\tt lldt} requires CPL 0.
   7.237 +
   7.238 +
   7.239 +Xen also allows guest operating systems to update just an 
   7.240 +individual segment descriptor in the GDT or LDT:  
   7.241 +
   7.242 +\begin{quote}
   7.243 +\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
   7.244 +unsigned long word2)}
   7.245 +
   7.246 +Update the GDT/LDT entry at machine address {\tt ma}; the new
   7.247 +8-byte descriptor is stored in {\tt word1} and {\tt word2}.
   7.248 +Xen performs a number of checks to ensure the descriptor is 
   7.249 +valid. 
   7.250 +
   7.251 +\end{quote}
   7.252 +
   7.253 +Guest OSes can use the above in place of context switching entire 
   7.254 +LDTs (or the GDT) when the number of changing descriptors is small. 
   7.255 +
   7.256 +\section{Context Switching} 
   7.257 +
   7.258 +When a guest OS wishes to context switch between two processes, 
   7.259 +it can use the page table and segmentation hypercalls described
   7.260 +above to perform the the bulk of the privileged work. In addition, 
   7.261 +however, it will need to invoke Xen to switch the kernel (ring 1) 
   7.262 +stack pointer: 
   7.263 +
   7.264 +\begin{quote} 
   7.265 +\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 
   7.266 +
   7.267 +Request kernel stack switch from hypervisor; {\tt ss} is the new 
   7.268 +stack segment, which {\tt esp} is the new stack pointer. 
   7.269 +
   7.270 +\end{quote} 
   7.271 +
   7.272 +A final useful hypercall for context switching allows ``lazy'' 
   7.273 +save and restore of floating point state: 
   7.274 +
   7.275 +\begin{quote}
   7.276 +\hypercall{fpu\_taskswitch(void)} 
   7.277 +
   7.278 +This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
   7.279 +control register; this means that the next attempt to use floating
   7.280 +point will cause a trap which the guest OS can trap. Typically it will
   7.281 +then save/restore the FP state, and clear the {\tt TS} bit. 
   7.282 +\end{quote} 
   7.283 +
   7.284 +This is provided as an optimization only; guest OSes can also choose
   7.285 +to save and restore FP state on all context switches for simplicity. 
   7.286 +
   7.287 +
   7.288 +\section{Physical Memory Management}
   7.289 +
   7.290 +As mentioned previously, each domain has a maximum and current 
   7.291 +memory allocation. The maximum allocation, set at domain creation 
   7.292 +time, cannot be modified. However a domain can choose to reduce 
   7.293 +and subsequently grow its current allocation by using the
   7.294 +following call: 
   7.295 +
   7.296 +\begin{quote} 
   7.297 +\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
   7.298 +  unsigned long nr\_extents, unsigned int extent\_order)}
   7.299 +
   7.300 +Increase or decrease current memory allocation (as determined by 
   7.301 +the value of {\tt op}). Each invocation provides a list of 
   7.302 +extents each of which is $2^s$ pages in size, 
   7.303 +where $s$ is the value of {\tt extent\_order}. 
   7.304 +
   7.305 +\end{quote} 
   7.306 +
   7.307 +In addition to simply reducing or increasing the current memory
   7.308 +allocation via a `balloon driver', this call is also useful for 
   7.309 +obtaining contiguous regions of machine memory when required (e.g. 
   7.310 +for certain PCI devices, or if using superpages).  
   7.311 +
   7.312 +
   7.313 +\section{Inter-Domain Communication}
   7.314 +\label{s:idc} 
   7.315 +
   7.316 +Xen provides a simple asynchronous notification mechanism via
   7.317 +\emph{event channels}. Each domain has a set of end-points (or
   7.318 +\emph{ports}) which may be bound to an event source (e.g. a physical
   7.319 +IRQ, a virtual IRQ, or an port in another domain). When a pair of
   7.320 +end-points in two different domains are bound together, then a `send'
   7.321 +operation on one will cause an event to be received by the destination
   7.322 +domain.
   7.323 +
   7.324 +The control and use of event channels involves the following hypercall: 
   7.325 +
   7.326 +\begin{quote}
   7.327 +\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 
   7.328 +
   7.329 +Inter-domain event-channel management; {\tt op} is a discriminated 
   7.330 +union which allows the following 7 operations: 
   7.331 +
   7.332 +\begin{description} 
   7.333 +
   7.334 +\item[\it alloc\_unbound:] allocate a free (unbound) local
   7.335 +  port and prepare for connection from a specified domain. 
   7.336 +\item[\it bind\_virq:] bind a local port to a virtual 
   7.337 +IRQ; any particular VIRQ can be bound to at most one port per domain. 
   7.338 +\item[\it bind\_pirq:] bind a local port to a physical IRQ;
   7.339 +once more, a given pIRQ can be bound to at most one port per
   7.340 +domain. Furthermore the calling domain must be sufficiently
   7.341 +privileged.
   7.342 +\item[\it bind\_interdomain:] construct an interdomain event 
   7.343 +channel; in general, the target domain must have previously allocated 
   7.344 +an unbound port for this channel, although this can be bypassed by 
   7.345 +privileged domains during domain setup. 
   7.346 +\item[\it close:] close an interdomain event channel. 
   7.347 +\item[\it send:] send an event to the remote end of a 
   7.348 +interdomain event channel. 
   7.349 +\item[\it status:] determine the current status of a local port. 
   7.350 +\end{description} 
   7.351 +
   7.352 +For more details see
   7.353 +{\tt xen/include/public/event\_channel.h}. 
   7.354 +
   7.355 +\end{quote} 
   7.356 +
   7.357 +Event channels are the fundamental communication primitive between 
   7.358 +Xen domains and seamlessly support SMP. However they provide little
   7.359 +bandwidth for communication {\sl per se}, and hence are typically 
   7.360 +married with a piece of shared memory to produce effective and 
   7.361 +high-performance inter-domain communication. 
   7.362 +
   7.363 +Safe sharing of memory pages between guest OSes is carried out by
   7.364 +granting access on a per page basis to individual domains. This is
   7.365 +achieved by using the {\tt grant\_table\_op()} hypercall.
   7.366 +
   7.367 +\begin{quote}
   7.368 +\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
   7.369 +
   7.370 +Grant or remove access to a particular page to a particular domain. 
   7.371 +
   7.372 +\end{quote} 
   7.373 +
   7.374 +This is not currently widely in use by guest operating systems, but 
   7.375 +we intend to integrate support more fully in the near future. 
   7.376 +
   7.377 +\section{PCI Configuration} 
   7.378 +
   7.379 +Domains with physical device access (i.e.\ driver domains) receive
   7.380 +limited access to certain PCI devices (bus address space and
   7.381 +interrupts). However many guest operating systems attempt to 
   7.382 +determine the PCI configuration by directly access the PCI BIOS, 
   7.383 +which cannot be allowed for safety. 
   7.384 +
   7.385 +Instead, Xen provides the following hypercall: 
   7.386 +
   7.387 +\begin{quote}
   7.388 +\hypercall{physdev\_op(void *physdev\_op)}
   7.389 +
   7.390 +Perform a PCI configuration option; depending on the value 
   7.391 +of {\tt physdev\_op} this can be a PCI config read, a PCI config 
   7.392 +write, or a small number of other queries. 
   7.393 +
   7.394 +\end{quote} 
   7.395 +
   7.396 +
   7.397 +For examples of using {\tt physdev\_op()}, see the 
   7.398 +Xen-specific PCI code in the linux sparse tree. 
   7.399 +
   7.400 +\section{Administrative Operations}
   7.401 +\label{s:dom0ops}
   7.402 +
   7.403 +A large number of control operations are available to a sufficiently
   7.404 +privileged domain (typically domain 0). These allow the creation and
   7.405 +management of new domains, for example. A complete list is given 
   7.406 +below: for more details on any or all of these, please see 
   7.407 +{\tt xen/include/public/dom0\_ops.h} 
   7.408 +
   7.409 +
   7.410 +\begin{quote}
   7.411 +\hypercall{dom0\_op(dom0\_op\_t *op)} 
   7.412 +
   7.413 +Administrative domain operations for domain management. The options are:
   7.414 +
   7.415 +\begin{description} 
   7.416 +\item [\it DOM0\_CREATEDOMAIN:] create a new domain
   7.417 +
   7.418 +\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 
   7.419 +queue. 
   7.420 +
   7.421 +\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
   7.422 +  once again. 
   7.423 +
   7.424 +\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
   7.425 +with a domain
   7.426 +
   7.427 +\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
   7.428 +
   7.429 +\item [\it DOM0\_SCHEDCTL:]
   7.430 +
   7.431 +\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
   7.432 +
   7.433 +\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
   7.434 +
   7.435 +\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
   7.436 +
   7.437 +\item [\it DOM0\_GETPAGEFRAMEINFO:] 
   7.438 +
   7.439 +\item [\it DOM0\_GETPAGEFRAMEINFO2:]
   7.440 +
   7.441 +\item [\it DOM0\_IOPL:] set I/O privilege level
   7.442 +
   7.443 +\item [\it DOM0\_MSR:] read or write model specific registers
   7.444 +
   7.445 +\item [\it DOM0\_DEBUG:] interactively invoke the debugger
   7.446 +
   7.447 +\item [\it DOM0\_SETTIME:] set system time
   7.448 +
   7.449 +\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
   7.450 +
   7.451 +\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
   7.452 +
   7.453 +\item [\it DOM0\_GETTBUFS:] get information about the size and location of
   7.454 +                      the trace buffers (only on trace-buffer enabled builds)
   7.455 +
   7.456 +\item [\it DOM0\_PHYSINFO:] get information about the host machine
   7.457 +
   7.458 +\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
   7.459 +
   7.460 +\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
   7.461 +
   7.462 +\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
   7.463 +
   7.464 +\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
   7.465 +
   7.466 +\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
   7.467 +
   7.468 +\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
   7.469 +\end{description} 
   7.470 +\end{quote} 
   7.471 +
   7.472 +Most of the above are best understood by looking at the code 
   7.473 +implementing them (in {\tt xen/common/dom0\_ops.c}) and in 
   7.474 +the user-space tools that use them (mostly in {\tt tools/libxc}). 
   7.475 +
   7.476 +\section{Debugging Hypercalls} 
   7.477 +
   7.478 +A few additional hypercalls are mainly useful for debugging: 
   7.479 +
   7.480 +\begin{quote} 
   7.481 +\hypercall{console\_io(int cmd, int count, char *str)}
   7.482 +
   7.483 +Use Xen to interact with the console; operations are:
   7.484 +
   7.485 +{\it CONSOLEIO\_write}: Output count characters from buffer str.
   7.486 +
   7.487 +{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
   7.488 +\end{quote} 
   7.489 +
   7.490 +A pair of hypercalls allows access to the underlying debug registers: 
   7.491 +\begin{quote}
   7.492 +\hypercall{set\_debugreg(int reg, unsigned long value)}
   7.493 +
   7.494 +Set debug register {\tt reg} to {\tt value} 
   7.495 +
   7.496 +\hypercall{get\_debugreg(int reg)}
   7.497 +
   7.498 +Return the contents of the debug register {\tt reg}
   7.499 +\end{quote}
   7.500 +
   7.501 +And finally: 
   7.502 +\begin{quote}
   7.503 +\hypercall{xen\_version(int cmd)}
   7.504 +
   7.505 +Request Xen version number.
   7.506 +\end{quote} 
   7.507 +
   7.508 +This is useful to ensure that user-space tools are in sync 
   7.509 +with the underlying hypervisor. 
   7.510 +
   7.511 +\section{Deprecated Hypercalls}
   7.512 +
   7.513 +Xen is under constant development and refinement; as such there 
   7.514 +are plans to improve the way in which various pieces of functionality 
   7.515 +are exposed to guest OSes. 
   7.516 +
   7.517 +\begin{quote} 
   7.518 +\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
   7.519 +
   7.520 +Toggle various memory management modes (in particular wrritable page
   7.521 +tables and superpage support). 
   7.522 +
   7.523 +\end{quote} 
   7.524 +
   7.525 +This is likely to be replaced with mode values in the shared 
   7.526 +information page since this is more resilient for resumption 
   7.527 +after migration or checkpoint. 
     8.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     8.2 +++ b/docs/src/interface/memory.tex	Tue Sep 20 09:17:33 2005 +0000
     8.3 @@ -0,0 +1,162 @@
     8.4 +\chapter{Memory}
     8.5 +\label{c:memory} 
     8.6 +
     8.7 +Xen is responsible for managing the allocation of physical memory to
     8.8 +domains, and for ensuring safe use of the paging and segmentation
     8.9 +hardware.
    8.10 +
    8.11 +
    8.12 +\section{Memory Allocation}
    8.13 +
    8.14 +Xen resides within a small fixed portion of physical memory; it also
    8.15 +reserves the top 64MB of every virtual address space. The remaining
    8.16 +physical memory is available for allocation to domains at a page
    8.17 +granularity.  Xen tracks the ownership and use of each page, which
    8.18 +allows it to enforce secure partitioning between domains.
    8.19 +
    8.20 +Each domain has a maximum and current physical memory allocation.  A
    8.21 +guest OS may run a `balloon driver' to dynamically adjust its current
    8.22 +memory allocation up to its limit.
    8.23 +
    8.24 +
    8.25 +%% XXX SMH: I use machine and physical in the next section (which is
    8.26 +%% kinda required for consistency with code); wonder if this section
    8.27 +%% should use same terms?
    8.28 +%%
    8.29 +%% Probably. 
    8.30 +%%
    8.31 +%% Merging this and below section at some point prob makes sense.
    8.32 +
    8.33 +\section{Pseudo-Physical Memory}
    8.34 +
    8.35 +Since physical memory is allocated and freed on a page granularity,
    8.36 +there is no guarantee that a domain will receive a contiguous stretch
    8.37 +of physical memory. However most operating systems do not have good
    8.38 +support for operating in a fragmented physical address space. To aid
    8.39 +porting such operating systems to run on top of Xen, we make a
    8.40 +distinction between \emph{machine memory} and \emph{pseudo-physical
    8.41 +  memory}.
    8.42 +
    8.43 +Put simply, machine memory refers to the entire amount of memory
    8.44 +installed in the machine, including that reserved by Xen, in use by
    8.45 +various domains, or currently unallocated. We consider machine memory
    8.46 +to comprise a set of 4K \emph{machine page frames} numbered
    8.47 +consecutively starting from 0. Machine frame numbers mean the same
    8.48 +within Xen or any domain.
    8.49 +
    8.50 +Pseudo-physical memory, on the other hand, is a per-domain
    8.51 +abstraction. It allows a guest operating system to consider its memory
    8.52 +allocation to consist of a contiguous range of physical page frames
    8.53 +starting at physical frame 0, despite the fact that the underlying
    8.54 +machine page frames may be sparsely allocated and in any order.
    8.55 +
    8.56 +To achieve this, Xen maintains a globally readable {\it
    8.57 +  machine-to-physical} table which records the mapping from machine
    8.58 +page frames to pseudo-physical ones. In addition, each domain is
    8.59 +supplied with a {\it physical-to-machine} table which performs the
    8.60 +inverse mapping. Clearly the machine-to-physical table has size
    8.61 +proportional to the amount of RAM installed in the machine, while each
    8.62 +physical-to-machine table has size proportional to the memory
    8.63 +allocation of the given domain.
    8.64 +
    8.65 +Architecture dependent code in guest operating systems can then use
    8.66 +the two tables to provide the abstraction of pseudo-physical memory.
    8.67 +In general, only certain specialized parts of the operating system
    8.68 +(such as page table management) needs to understand the difference
    8.69 +between machine and pseudo-physical addresses.
    8.70 +
    8.71 +
    8.72 +\section{Page Table Updates}
    8.73 +
    8.74 +In the default mode of operation, Xen enforces read-only access to
    8.75 +page tables and requires guest operating systems to explicitly request
    8.76 +any modifications.  Xen validates all such requests and only applies
    8.77 +updates that it deems safe.  This is necessary to prevent domains from
    8.78 +adding arbitrary mappings to their page tables.
    8.79 +
    8.80 +To aid validation, Xen associates a type and reference count with each
    8.81 +memory page. A page has one of the following mutually-exclusive types
    8.82 +at any point in time: page directory ({\sf PD}), page table ({\sf
    8.83 +  PT}), local descriptor table ({\sf LDT}), global descriptor table
    8.84 +({\sf GDT}), or writable ({\sf RW}). Note that a guest OS may always
    8.85 +create readable mappings of its own memory regardless of its current
    8.86 +type.
    8.87 +
    8.88 +%%% XXX: possibly explain more about ref count 'lifecyle' here?
    8.89 +This mechanism is used to maintain the invariants required for safety;
    8.90 +for example, a domain cannot have a writable mapping to any part of a
    8.91 +page table as this would require the page concerned to simultaneously
    8.92 +be of types {\sf PT} and {\sf RW}.
    8.93 +
    8.94 +
    8.95 +% \section{Writable Page Tables}
    8.96 +
    8.97 +Xen also provides an alternative mode of operation in which guests be
    8.98 +have the illusion that their page tables are directly writable.  Of
    8.99 +course this is not really the case, since Xen must still validate
   8.100 +modifications to ensure secure partitioning. To this end, Xen traps
   8.101 +any write attempt to a memory page of type {\sf PT} (i.e., that is
   8.102 +currently part of a page table).  If such an access occurs, Xen
   8.103 +temporarily allows write access to that page while at the same time
   8.104 +\emph{disconnecting} it from the page table that is currently in use.
   8.105 +This allows the guest to safely make updates to the page because the
   8.106 +newly-updated entries cannot be used by the MMU until Xen revalidates
   8.107 +and reconnects the page.  Reconnection occurs automatically in a
   8.108 +number of situations: for example, when the guest modifies a different
   8.109 +page-table page, when the domain is preempted, or whenever the guest
   8.110 +uses Xen's explicit page-table update interfaces.
   8.111 +
   8.112 +Finally, Xen also supports a form of \emph{shadow page tables} in
   8.113 +which the guest OS uses a independent copy of page tables which are
   8.114 +unknown to the hardware (i.e.\ which are never pointed to by {\tt
   8.115 +  cr3}). Instead Xen propagates changes made to the guest's tables to
   8.116 +the real ones, and vice versa. This is useful for logging page writes
   8.117 +(e.g.\ for live migration or checkpoint). A full version of the shadow
   8.118 +page tables also allows guest OS porting with less effort.
   8.119 +
   8.120 +
   8.121 +\section{Segment Descriptor Tables}
   8.122 +
   8.123 +On boot a guest is supplied with a default GDT, which does not reside
   8.124 +within its own memory allocation.  If the guest wishes to use other
   8.125 +than the default `flat' ring-1 and ring-3 segments that this GDT
   8.126 +provides, it must register a custom GDT and/or LDT with Xen, allocated
   8.127 +from its own memory. Note that a number of GDT entries are reserved by
   8.128 +Xen -- any custom GDT must also include sufficient space for these
   8.129 +entries.
   8.130 +
   8.131 +For example, the following hypercall is used to specify a new GDT:
   8.132 +
   8.133 +\begin{quote}
   8.134 +  int {\bf set\_gdt}(unsigned long *{\em frame\_list}, int {\em
   8.135 +    entries})
   8.136 +
   8.137 +  \emph{frame\_list}: An array of up to 16 machine page frames within
   8.138 +  which the GDT resides.  Any frame registered as a GDT frame may only
   8.139 +  be mapped read-only within the guest's address space (e.g., no
   8.140 +  writable mappings, no use as a page-table page, and so on).
   8.141 +
   8.142 +  \emph{entries}: The number of descriptor-entry slots in the GDT.
   8.143 +  Note that the table must be large enough to contain Xen's reserved
   8.144 +  entries; thus we must have `{\em entries $>$
   8.145 +    LAST\_RESERVED\_GDT\_ENTRY}\ '.  Note also that, after registering
   8.146 +  the GDT, slots \emph{FIRST\_} through
   8.147 +  \emph{LAST\_RESERVED\_GDT\_ENTRY} are no longer usable by the guest
   8.148 +  and may be overwritten by Xen.
   8.149 +\end{quote}
   8.150 +
   8.151 +The LDT is updated via the generic MMU update mechanism (i.e., via the
   8.152 +{\tt mmu\_update()} hypercall.
   8.153 +
   8.154 +\section{Start of Day}
   8.155 +
   8.156 +The start-of-day environment for guest operating systems is rather
   8.157 +different to that provided by the underlying hardware. In particular,
   8.158 +the processor is already executing in protected mode with paging
   8.159 +enabled.
   8.160 +
   8.161 +{\it Domain 0} is created and booted by Xen itself. For all subsequent
   8.162 +domains, the analogue of the boot-loader is the {\it domain builder},
   8.163 +user-space software running in {\it domain 0}. The domain builder is
   8.164 +responsible for building the initial page tables for a domain and
   8.165 +loading its kernel image at the appropriate virtual address.
     9.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     9.2 +++ b/docs/src/interface/scheduling.tex	Tue Sep 20 09:17:33 2005 +0000
     9.3 @@ -0,0 +1,268 @@
     9.4 +\chapter{Scheduling API}  
     9.5 +
     9.6 +The scheduling API is used by both the schedulers described above and should
     9.7 +also be used by any new schedulers.  It provides a generic interface and also
     9.8 +implements much of the ``boilerplate'' code.
     9.9 +
    9.10 +Schedulers conforming to this API are described by the following
    9.11 +structure:
    9.12 +
    9.13 +\begin{verbatim}
    9.14 +struct scheduler
    9.15 +{
    9.16 +    char *name;             /* full name for this scheduler      */
    9.17 +    char *opt_name;         /* option name for this scheduler    */
    9.18 +    unsigned int sched_id;  /* ID for this scheduler             */
    9.19 +
    9.20 +    int          (*init_scheduler) ();
    9.21 +    int          (*alloc_task)     (struct task_struct *);
    9.22 +    void         (*add_task)       (struct task_struct *);
    9.23 +    void         (*free_task)      (struct task_struct *);
    9.24 +    void         (*rem_task)       (struct task_struct *);
    9.25 +    void         (*wake_up)        (struct task_struct *);
    9.26 +    void         (*do_block)       (struct task_struct *);
    9.27 +    task_slice_t (*do_schedule)    (s_time_t);
    9.28 +    int          (*control)        (struct sched_ctl_cmd *);
    9.29 +    int          (*adjdom)         (struct task_struct *,
    9.30 +                                    struct sched_adjdom_cmd *);
    9.31 +    s32          (*reschedule)     (struct task_struct *);
    9.32 +    void         (*dump_settings)  (void);
    9.33 +    void         (*dump_cpu_state) (int);
    9.34 +    void         (*dump_runq_el)   (struct task_struct *);
    9.35 +};
    9.36 +\end{verbatim}
    9.37 +
    9.38 +The only method that {\em must} be implemented is
    9.39 +{\tt do\_schedule()}.  However, if there is not some implementation for the
    9.40 +{\tt wake\_up()} method then waking tasks will not get put on the runqueue!
    9.41 +
    9.42 +The fields of the above structure are described in more detail below.
    9.43 +
    9.44 +\subsubsection{name}
    9.45 +
    9.46 +The name field should point to a descriptive ASCII string.
    9.47 +
    9.48 +\subsubsection{opt\_name}
    9.49 +
    9.50 +This field is the value of the {\tt sched=} boot-time option that will select
    9.51 +this scheduler.
    9.52 +
    9.53 +\subsubsection{sched\_id}
    9.54 +
    9.55 +This is an integer that uniquely identifies this scheduler.  There should be a
    9.56 +macro corrsponding to this scheduler ID in {\tt <xen/sched-if.h>}.
    9.57 +
    9.58 +\subsubsection{init\_scheduler}
    9.59 +
    9.60 +\paragraph*{Purpose}
    9.61 +
    9.62 +This is a function for performing any scheduler-specific initialisation.  For
    9.63 +instance, it might allocate memory for per-CPU scheduler data and initialise it
    9.64 +appropriately.
    9.65 +
    9.66 +\paragraph*{Call environment}
    9.67 +
    9.68 +This function is called after the initialisation performed by the generic
    9.69 +layer.  The function is called exactly once, for the scheduler that has been
    9.70 +selected.
    9.71 +
    9.72 +\paragraph*{Return values}
    9.73 +
    9.74 +This should return negative on failure --- this will cause an
    9.75 +immediate panic and the system will fail to boot.
    9.76 +
    9.77 +\subsubsection{alloc\_task}
    9.78 +
    9.79 +\paragraph*{Purpose}
    9.80 +Called when a {\tt task\_struct} is allocated by the generic scheduler
    9.81 +layer.  A particular scheduler implementation may use this method to
    9.82 +allocate per-task data for this task.  It may use the {\tt
    9.83 +sched\_priv} pointer in the {\tt task\_struct} to point to this data.
    9.84 +
    9.85 +\paragraph*{Call environment}
    9.86 +The generic layer guarantees that the {\tt sched\_priv} field will
    9.87 +remain intact from the time this method is called until the task is
    9.88 +deallocated (so long as the scheduler implementation does not change
    9.89 +it explicitly!).
    9.90 +
    9.91 +\paragraph*{Return values}
    9.92 +Negative on failure.
    9.93 +
    9.94 +\subsubsection{add\_task}
    9.95 +
    9.96 +\paragraph*{Purpose}
    9.97 +
    9.98 +Called when a task is initially added by the generic layer.
    9.99 +
   9.100 +\paragraph*{Call environment}
   9.101 +
   9.102 +The fields in the {\tt task\_struct} are now filled out and available for use.
   9.103 +Schedulers should implement appropriate initialisation of any per-task private
   9.104 +information in this method.
   9.105 +
   9.106 +\subsubsection{free\_task}
   9.107 +
   9.108 +\paragraph*{Purpose}
   9.109 +
   9.110 +Schedulers should free the space used by any associated private data
   9.111 +structures.
   9.112 +
   9.113 +\paragraph*{Call environment}
   9.114 +
   9.115 +This is called when a {\tt task\_struct} is about to be deallocated.
   9.116 +The generic layer will have done generic task removal operations and
   9.117 +(if implemented) called the scheduler's {\tt rem\_task} method before
   9.118 +this method is called.
   9.119 +
   9.120 +\subsubsection{rem\_task}
   9.121 +
   9.122 +\paragraph*{Purpose}
   9.123 +
   9.124 +This is called when a task is being removed from scheduling (but is
   9.125 +not yet being freed).
   9.126 +
   9.127 +\subsubsection{wake\_up}
   9.128 +
   9.129 +\paragraph*{Purpose}
   9.130 +
   9.131 +Called when a task is woken up, this method should put the task on the runqueue
   9.132 +(or do the scheduler-specific equivalent action).
   9.133 +
   9.134 +\paragraph*{Call environment}
   9.135 +
   9.136 +The task is already set to state RUNNING.
   9.137 +
   9.138 +\subsubsection{do\_block}
   9.139 +
   9.140 +\paragraph*{Purpose}
   9.141 +
   9.142 +This function is called when a task is blocked.  This function should
   9.143 +not remove the task from the runqueue.
   9.144 +
   9.145 +\paragraph*{Call environment}
   9.146 +
   9.147 +The EVENTS\_MASTER\_ENABLE\_BIT is already set and the task state changed to
   9.148 +TASK\_INTERRUPTIBLE on entry to this method.  A call to the {\tt
   9.149 +  do\_schedule} method will be made after this method returns, in
   9.150 +order to select the next task to run.
   9.151 +
   9.152 +\subsubsection{do\_schedule}
   9.153 +
   9.154 +This method must be implemented.
   9.155 +
   9.156 +\paragraph*{Purpose}
   9.157 +
   9.158 +The method is called each time a new task must be chosen for scheduling on the
   9.159 +current CPU.  The current time as passed as the single argument (the current
   9.160 +task can be found using the {\tt current} macro).
   9.161 +
   9.162 +This method should select the next task to run on this CPU and set it's minimum
   9.163 +time to run as well as returning the data described below.
   9.164 +
   9.165 +This method should also take the appropriate action if the previous
   9.166 +task has blocked, e.g. removing it from the runqueue.
   9.167 +
   9.168 +\paragraph*{Call environment}
   9.169 +
   9.170 +The other fields in the {\tt task\_struct} are updated by the generic layer,
   9.171 +which also performs all Xen-specific tasks and performs the actual task switch
   9.172 +(unless the previous task has been chosen again).
   9.173 +
   9.174 +This method is called with the {\tt schedule\_lock} held for the current CPU
   9.175 +and local interrupts disabled.
   9.176 +
   9.177 +\paragraph*{Return values}
   9.178 +
   9.179 +Must return a {\tt struct task\_slice} describing what task to run and how long
   9.180 +for (at maximum).
   9.181 +
   9.182 +\subsubsection{control}
   9.183 +
   9.184 +\paragraph*{Purpose}
   9.185 +
   9.186 +This method is called for global scheduler control operations.  It takes a
   9.187 +pointer to a {\tt struct sched\_ctl\_cmd}, which it should either
   9.188 +source data from or populate with data, depending on the value of the
   9.189 +{\tt direction} field.
   9.190 +
   9.191 +\paragraph*{Call environment}
   9.192 +
   9.193 +The generic layer guarantees that when this method is called, the
   9.194 +caller selected the correct scheduler ID, hence the scheduler's
   9.195 +implementation does not need to sanity-check these parts of the call.
   9.196 +
   9.197 +\paragraph*{Return values}
   9.198 +
   9.199 +This function should return the value to be passed back to user space, hence it
   9.200 +should either be 0 or an appropriate errno value.
   9.201 +
   9.202 +\subsubsection{sched\_adjdom}
   9.203 +
   9.204 +\paragraph*{Purpose}
   9.205 +
   9.206 +This method is called to adjust the scheduling parameters of a particular
   9.207 +domain, or to query their current values.  The function should check
   9.208 +the {\tt direction} field of the {\tt sched\_adjdom\_cmd} it receives in
   9.209 +order to determine which of these operations is being performed.
   9.210 +
   9.211 +\paragraph*{Call environment}
   9.212 +
   9.213 +The generic layer guarantees that the caller has specified the correct
   9.214 +control interface version and scheduler ID and that the supplied {\tt
   9.215 +task\_struct} will not be deallocated during the call (hence it is not
   9.216 +necessary to {\tt get\_task\_struct}).
   9.217 +
   9.218 +\paragraph*{Return values}
   9.219 +
   9.220 +This function should return the value to be passed back to user space, hence it
   9.221 +should either be 0 or an appropriate errno value.
   9.222 +
   9.223 +\subsubsection{reschedule}
   9.224 +
   9.225 +\paragraph*{Purpose}
   9.226 +
   9.227 +This method is called to determine if a reschedule is required as a result of a
   9.228 +particular task.
   9.229 +
   9.230 +\paragraph*{Call environment}
   9.231 +The generic layer will cause a reschedule if the current domain is the idle
   9.232 +task or it has exceeded its minimum time slice before a reschedule.  The
   9.233 +generic layer guarantees that the task passed is not currently running but is
   9.234 +on the runqueue.
   9.235 +
   9.236 +\paragraph*{Return values}
   9.237 +
   9.238 +Should return a mask of CPUs to cause a reschedule on.
   9.239 +
   9.240 +\subsubsection{dump\_settings}
   9.241 +
   9.242 +\paragraph*{Purpose}
   9.243 +
   9.244 +If implemented, this should dump any private global settings for this
   9.245 +scheduler to the console.
   9.246 +
   9.247 +\paragraph*{Call environment}
   9.248 +
   9.249 +This function is called with interrupts enabled.
   9.250 +
   9.251 +\subsubsection{dump\_cpu\_state}
   9.252 +
   9.253 +\paragraph*{Purpose}
   9.254 +
   9.255 +This method should dump any private settings for the specified CPU.
   9.256 +
   9.257 +\paragraph*{Call environment}
   9.258 +
   9.259 +This function is called with interrupts disabled and the {\tt schedule\_lock}
   9.260 +for the specified CPU held.
   9.261 +
   9.262 +\subsubsection{dump\_runq\_el}
   9.263 +
   9.264 +\paragraph*{Purpose}
   9.265 +
   9.266 +This method should dump any private settings for the specified task.
   9.267 +
   9.268 +\paragraph*{Call environment}
   9.269 +
   9.270 +This function is called with interrupts disabled and the {\tt schedule\_lock}
   9.271 +for the task's CPU held.
    10.1 --- a/docs/src/user.tex	Tue Sep 20 09:08:26 2005 +0000
    10.2 +++ b/docs/src/user.tex	Tue Sep 20 09:17:33 2005 +0000
    10.3 @@ -59,1803 +59,36 @@ Contributions of material, suggestions a
    10.4  \renewcommand{\floatpagefraction}{.8}
    10.5  \setstretch{1.1}
    10.6  
    10.7 -\part{Introduction and Tutorial}
    10.8 -\chapter{Introduction}
    10.9 -
   10.10 -Xen is a {\em paravirtualising} virtual machine monitor (VMM), or
   10.11 -`hypervisor', for the x86 processor architecture.  Xen can securely
   10.12 -execute multiple virtual machines on a single physical system with
   10.13 -close-to-native performance.  The virtual machine technology
   10.14 -facilitates enterprise-grade functionality, including:
   10.15 -
   10.16 -\begin{itemize}
   10.17 -\item Virtual machines with performance close to native
   10.18 -  hardware.
   10.19 -\item Live migration of running virtual machines between physical hosts.
   10.20 -\item Excellent hardware support (supports most Linux device drivers).
   10.21 -\item Sandboxed, restartable device drivers.
   10.22 -\end{itemize}
   10.23 -
   10.24 -Paravirtualisation permits very high performance virtualisation,
   10.25 -even on architectures like x86 that are traditionally
   10.26 -very hard to virtualise.
   10.27 -The drawback of this approach is that it requires operating systems to
   10.28 -be {\em ported} to run on Xen.  Porting an OS to run on Xen is similar
   10.29 -to supporting a new hardware platform, however the process
   10.30 -is simplified because the paravirtual machine architecture is very
   10.31 -similar to the underlying native hardware. Even though operating system
   10.32 -kernels must explicitly support Xen, a key feature is that user space
   10.33 -applications and libraries {\em do not} require modification.
   10.34 -
   10.35 -Xen support is available for increasingly many operating systems:
   10.36 -right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0.
   10.37 -A FreeBSD port is undergoing testing and will be incorporated into the
   10.38 -release soon. Other OS ports, including Plan 9, are in progress.  We
   10.39 -hope that that arch-xen patches will be incorporated into the
   10.40 -mainstream releases of these operating systems in due course (as has
   10.41 -already happened for NetBSD).
   10.42 -
   10.43 -Possible usage scenarios for Xen include:
   10.44 -\begin{description}
   10.45 -\item [Kernel development.] Test and debug kernel modifications in a
   10.46 -      sandboxed virtual machine --- no need for a separate test
   10.47 -      machine.
   10.48 -\item [Multiple OS configurations.] Run multiple operating systems
   10.49 -      simultaneously, for instance for compatibility or QA purposes.
   10.50 -\item [Server consolidation.] Move multiple servers onto a single
   10.51 -      physical host with performance and fault isolation provided at
   10.52 -      virtual machine boundaries. 
   10.53 -\item [Cluster computing.] Management at VM granularity provides more
   10.54 -      flexibility than separately managing each physical host, but
   10.55 -      better control and isolation than single-system image solutions, 
   10.56 -      particularly by using live migration for load balancing. 
   10.57 -\item [Hardware support for custom OSes.] Allow development of new OSes
   10.58 -      while benefiting from the wide-ranging hardware support of
   10.59 -      existing OSes such as Linux.
   10.60 -\end{description}
   10.61 -
   10.62 -\section{Structure of a Xen-Based System}
   10.63 -
   10.64 -A Xen system has multiple layers, the lowest and most privileged of
   10.65 -which is Xen itself. 
   10.66 -Xen in turn may host multiple {\em guest} operating systems, each of
   10.67 -which is executed within a secure virtual machine (in Xen terminology,
   10.68 -a {\em domain}). Domains are scheduled by Xen to make effective use of
   10.69 -the available physical CPUs.  Each guest OS manages its own
   10.70 -applications, which includes responsibility for scheduling each
   10.71 -application within the time allotted to the VM by Xen.
   10.72 -
   10.73 -The first domain, {\em domain 0}, is created automatically when the
   10.74 -system boots and has special management privileges. Domain 0 builds
   10.75 -other domains and manages their virtual devices. It also performs
   10.76 -administrative tasks such as suspending, resuming and migrating other
   10.77 -virtual machines.
   10.78 -
   10.79 -Within domain 0, a process called \emph{xend} runs to manage the system.
   10.80 -\Xend is responsible for managing virtual machines and providing access
   10.81 -to their consoles.  Commands are issued to \xend over an HTTP
   10.82 -interface, either from a command-line tool or from a web browser.
   10.83 -
   10.84 -\section{Hardware Support}
   10.85 -
   10.86 -Xen currently runs only on the x86 architecture, requiring a `P6' or
   10.87 -newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III,
   10.88 -Pentium IV, Xeon, AMD Athlon, AMD Duron).  Multiprocessor machines are
   10.89 -supported, and we also have basic support for HyperThreading (SMT),
   10.90 -although this remains a topic for ongoing research. A port
   10.91 -specifically for x86/64 is in progress, although Xen already runs on
   10.92 -such systems in 32-bit legacy mode. In addition a port to the IA64
   10.93 -architecture is approaching completion. We hope to add other
   10.94 -architectures such as PPC and ARM in due course.
   10.95 -
   10.96 -
   10.97 -Xen can currently use up to 4GB of memory.  It is possible for x86
   10.98 -machines to address up to 64GB of physical memory but there are no
   10.99 -current plans to support these systems: The x86/64 port is the
  10.100 -planned route to supporting larger memory sizes.
  10.101 -
  10.102 -Xen offloads most of the hardware support issues to the guest OS
  10.103 -running in Domain~0.  Xen itself contains only the code required to
  10.104 -detect and start secondary processors, set up interrupt routing, and
  10.105 -perform PCI bus enumeration.  Device drivers run within a privileged
  10.106 -guest OS rather than within Xen itself. This approach provides
  10.107 -compatibility with the majority of device hardware supported by Linux.
  10.108 -The default XenLinux build contains support for relatively modern
  10.109 -server-class network and disk hardware, but you can add support for
  10.110 -other hardware by configuring your XenLinux kernel in the normal way.
  10.111 -
  10.112 -\section{History}
  10.113 -
  10.114 -Xen was originally developed by the Systems Research Group at the
  10.115 -University of Cambridge Computer Laboratory as part of the XenoServers
  10.116 -project, funded by the UK-EPSRC.
  10.117 -XenoServers aim to provide a `public infrastructure for
  10.118 -global distributed computing', and Xen plays a key part in that,
  10.119 -allowing us to efficiently partition a single machine to enable
  10.120 -multiple independent clients to run their operating systems and
  10.121 -applications in an environment providing protection, resource
  10.122 -isolation and accounting.  The project web page contains further
  10.123 -information along with pointers to papers and technical reports:
  10.124 -\path{http://www.cl.cam.ac.uk/xeno} 
  10.125 -
  10.126 -Xen has since grown into a fully-fledged project in its own right,
  10.127 -enabling us to investigate interesting research issues regarding the
  10.128 -best techniques for virtualising resources such as the CPU, memory,
  10.129 -disk and network.  The project has been bolstered by support from
  10.130 -Intel Research Cambridge, and HP Labs, who are now working closely
  10.131 -with us.
  10.132 -
  10.133 -Xen was first described in a paper presented at SOSP in
  10.134 -2003\footnote{\tt
  10.135 -http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the first
  10.136 -public release (1.0) was made that October.  Since then, Xen has
  10.137 -significantly matured and is now used in production scenarios on
  10.138 -many sites.
  10.139 -
  10.140 -Xen 2.0 features greatly enhanced hardware support, configuration
  10.141 -flexibility, usability and a larger complement of supported operating
  10.142 -systems. This latest release takes Xen a step closer to becoming the 
  10.143 -definitive open source solution for virtualisation.
  10.144 -
  10.145 -\chapter{Installation}
  10.146 -
  10.147 -The Xen distribution includes three main components: Xen itself, ports
  10.148 -of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the user-space
  10.149 -tools required to manage a Xen-based system.  This chapter describes
  10.150 -how to install the Xen 2.0 distribution from source.  Alternatively,
  10.151 -there may be pre-built packages available as part of your operating
  10.152 -system distribution.
  10.153 -
  10.154 -\section{Prerequisites}
  10.155 -\label{sec:prerequisites}
  10.156 -
  10.157 -The following is a full list of prerequisites.  Items marked `$\dag$'
  10.158 -are required by the \xend control tools, and hence required if you
  10.159 -want to run more than one virtual machine; items marked `$*$' are only
  10.160 -required if you wish to build from source.
  10.161 -\begin{itemize}
  10.162 -\item A working Linux distribution using the GRUB bootloader and
  10.163 -running on a P6-class (or newer) CPU.
  10.164 -\item [$\dag$] The \path{iproute2} package. 
  10.165 -\item [$\dag$] The Linux bridge-utils\footnote{Available from 
  10.166 -{\tt http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl})
  10.167 -\item [$\dag$] An installation of Twisted v1.3 or
  10.168 -above\footnote{Available from {\tt
  10.169 -http://www.twistedmatrix.com}}. There may be a binary package
  10.170 -available for your distribution; alternatively it can be installed by
  10.171 -running `{\sl make install-twisted}' in the root of the Xen source
  10.172 -tree.
  10.173 -\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make).
  10.174 -\item [$*$] Development installation of libcurl (e.g., libcurl-devel) 
  10.175 -\item [$*$] Development installation of zlib (e.g., zlib-dev).
  10.176 -\item [$*$] Development installation of Python v2.2 or later (e.g., python-dev).
  10.177 -\item [$*$] \LaTeX and transfig are required to build the documentation.
  10.178 -\end{itemize}
  10.179 -
  10.180 -Once you have satisfied the relevant prerequisites, you can 
  10.181 -now install either a binary or source distribution of Xen. 
  10.182 -
  10.183 -\section{Installing from Binary Tarball} 
  10.184 -
  10.185 -Pre-built tarballs are available for download from the Xen 
  10.186 -download page
  10.187 -\begin{quote} 
  10.188 -{\tt http://xen.sf.net}
  10.189 -\end{quote} 
  10.190 -
  10.191 -Once you've downloaded the tarball, simply unpack and install: 
  10.192 -\begin{verbatim}
  10.193 -# tar zxvf xen-2.0-install.tgz
  10.194 -# cd xen-2.0-install
  10.195 -# sh ./install.sh 
  10.196 -\end{verbatim} 
  10.197 -
  10.198 -Once you've installed the binaries you need to configure
  10.199 -your system as described in Section~\ref{s:configure}. 
  10.200 -
  10.201 -\section{Installing from Source} 
  10.202 -
  10.203 -This section describes how to obtain, build, and install 
  10.204 -Xen from source. 
  10.205 -
  10.206 -\subsection{Obtaining the Source} 
  10.207 -
  10.208 -The Xen source tree is available as either a compressed source tar
  10.209 -ball or as a clone of our master BitKeeper repository.
  10.210 -
  10.211 -\begin{description} 
  10.212 -\item[Obtaining the Source Tarball]\mbox{} \\  
  10.213 -Stable versions (and daily snapshots) of the Xen source tree are
  10.214 -available as compressed tarballs from the Xen download page
  10.215 -\begin{quote} 
  10.216 -{\tt http://xen.sf.net}
  10.217 -\end{quote} 
  10.218 -
  10.219 -\item[Using BitKeeper]\mbox{} \\  
  10.220 -If you wish to install Xen from a clone of our latest BitKeeper
  10.221 -repository then you will need to install the BitKeeper tools.
  10.222 -Download instructions for BitKeeper can be obtained by filling out the
  10.223 -form at:
  10.224 -
  10.225 -\begin{quote} 
  10.226 -{\tt http://www.bitmover.com/cgi-bin/download.cgi}
  10.227 -\end{quote}
  10.228 -The public master BK repository for the 2.0 release lives at: 
  10.229 -\begin{quote}
  10.230 -{\tt bk://xen.bkbits.net/xen-2.0.bk}  
  10.231 -\end{quote} 
  10.232 -You can use BitKeeper to
  10.233 -download it and keep it updated with the latest features and fixes.
  10.234 -
  10.235 -Change to the directory in which you want to put the source code, then
  10.236 -run:
  10.237 -\begin{verbatim}
  10.238 -# bk clone bk://xen.bkbits.net/xen-2.0.bk
  10.239 -\end{verbatim}
  10.240 -
  10.241 -Under your current directory, a new directory named \path{xen-2.0.bk}
  10.242 -has been created, which contains all the source code for Xen, the OS
  10.243 -ports, and the control tools. You can update your repository with the
  10.244 -latest changes at any time by running:
  10.245 -\begin{verbatim}
  10.246 -# cd xen-2.0.bk # to change into the local repository
  10.247 -# bk pull       # to update the repository
  10.248 -\end{verbatim}
  10.249 -\end{description} 
  10.250 -
  10.251 -%\section{The distribution}
  10.252 -%
  10.253 -%The Xen source code repository is structured as follows:
  10.254 -%
  10.255 -%\begin{description}
  10.256 -%\item[\path{tools/}] Xen node controller daemon (Xend), command line tools, 
  10.257 -%  control libraries
  10.258 -%\item[\path{xen/}] The Xen VMM.
  10.259 -%\item[\path{linux-*-xen-sparse/}] Xen support for Linux.
  10.260 -%\item[\path{linux-*-patches/}] Experimental patches for Linux.
  10.261 -%\item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD.
  10.262 -%\item[\path{docs/}] Various documentation files for users and developers.
  10.263 -%\item[\path{extras/}] Bonus extras.
  10.264 -%\end{description}
  10.265 -
  10.266 -\subsection{Building from Source} 
  10.267 -
  10.268 -The top-level Xen Makefile includes a target `world' that will do the
  10.269 -following:
  10.270 -
  10.271 -\begin{itemize}
  10.272 -\item Build Xen
  10.273 -\item Build the control tools, including \xend
  10.274 -\item Download (if necessary) and unpack the Linux 2.6 source code,
  10.275 -      and patch it for use with Xen
  10.276 -\item Build a Linux kernel to use in domain 0 and a smaller
  10.277 -      unprivileged kernel, which can optionally be used for
  10.278 -      unprivileged virtual machines.
  10.279 -\end{itemize}
  10.280 -
  10.281 -
  10.282 -After the build has completed you should have a top-level 
  10.283 -directory called \path{dist/} in which all resulting targets 
  10.284 -will be placed; of particular interest are the two kernels 
  10.285 -XenLinux kernel images, one with a `-xen0' extension
  10.286 -which contains hardware device drivers and drivers for Xen's virtual
  10.287 -devices, and one with a `-xenU' extension that just contains the
  10.288 -virtual ones. These are found in \path{dist/install/boot/} along
  10.289 -with the image for Xen itself and the configuration files used
  10.290 -during the build. 
  10.291 -
  10.292 -The NetBSD port can be built using: 
  10.293 -\begin{quote}
  10.294 -\begin{verbatim}
  10.295 -# make netbsd20
  10.296 -\end{verbatim} 
  10.297 -\end{quote} 
  10.298 -NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch.
  10.299 -The snapshot is downloaded as part of the build process, if it is not
  10.300 -yet present in the \path{NETBSD\_SRC\_PATH} search path.  The build
  10.301 -process also downloads a toolchain which includes all the tools
  10.302 -necessary to build the NetBSD kernel under Linux.
  10.303 -
  10.304 -To customize further the set of kernels built you need to edit
  10.305 -the top-level Makefile. Look for the line: 
  10.306 -
  10.307 -\begin{quote}
  10.308 -\begin{verbatim}
  10.309 -KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU
  10.310 -\end{verbatim} 
  10.311 -\end{quote} 
  10.312 -
  10.313 -You can edit this line to include any set of operating system kernels
  10.314 -which have configurations in the top-level \path{buildconfigs/}
  10.315 -directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4
  10.316 -kernel containing only virtual device drivers.
  10.317 -
  10.318 -%% Inspect the Makefile if you want to see what goes on during a build.
  10.319 -%% Building Xen and the tools is straightforward, but XenLinux is more
  10.320 -%% complicated.  The makefile needs a `pristine' Linux kernel tree to which
  10.321 -%% it will then add the Xen architecture files.  You can tell the
  10.322 -%% makefile the location of the appropriate Linux compressed tar file by
  10.323 -%% setting the LINUX\_SRC environment variable, e.g. \\
  10.324 -%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by
  10.325 -%% placing the tar file somewhere in the search path of {\tt
  10.326 -%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'.  If the makefile
  10.327 -%% can't find a suitable kernel tar file it attempts to download it from
  10.328 -%% kernel.org (this won't work if you're behind a firewall).
  10.329 -
  10.330 -%% After untaring the pristine kernel tree, the makefile uses the {\tt
  10.331 -%% mkbuildtree} script to add the Xen patches to the kernel. 
  10.332 -
  10.333 -
  10.334 -%% The procedure is similar to build the Linux 2.4 port: \\
  10.335 -%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24!
  10.336 -
  10.337 -
  10.338 -%% \framebox{\parbox{5in}{
  10.339 -%% {\bf Distro specific:} \\
  10.340 -%% {\it Gentoo} --- if not using udev (most installations, currently), you'll need
  10.341 -%% to enable devfs and devfs mount at boot time in the xen0 config.
  10.342 -%% }}
  10.343 -
  10.344 -\subsection{Custom XenLinux Builds}
  10.345 -
  10.346 -% If you have an SMP machine you may wish to give the {\tt '-j4'}
  10.347 -% argument to make to get a parallel build.
  10.348 -
  10.349 -If you wish to build a customized XenLinux kernel (e.g. to support
  10.350 -additional devices or enable distribution-required features), you can
  10.351 -use the standard Linux configuration mechanisms, specifying that the
  10.352 -architecture being built for is \path{xen}, e.g:
  10.353 -\begin{quote}
  10.354 -\begin{verbatim} 
  10.355 -# cd linux-2.6.11-xen0 
  10.356 -# make ARCH=xen xconfig 
  10.357 -# cd ..
  10.358 -# make
  10.359 -\end{verbatim} 
  10.360 -\end{quote} 
  10.361 -
  10.362 -You can also copy an existing Linux configuration (\path{.config}) 
  10.363 -into \path{linux-2.6.11-xen0} and execute:  
  10.364 -\begin{quote}
  10.365 -\begin{verbatim} 
  10.366 -# make ARCH=xen oldconfig 
  10.367 -\end{verbatim} 
  10.368 -\end{quote} 
  10.369 -
  10.370 -You may be prompted with some Xen-specific options; we 
  10.371 -advise accepting the defaults for these options.
  10.372 -
  10.373 -Note that the only difference between the two types of Linux kernel
  10.374 -that are built is the configuration file used for each.  The "U"
  10.375 -suffixed (unprivileged) versions don't contain any of the physical
  10.376 -hardware device drivers, leading to a 30\% reduction in size; hence
  10.377 -you may prefer these for your non-privileged domains.  The `0'
  10.378 -suffixed privileged versions can be used to boot the system, as well
  10.379 -as in driver domains and unprivileged domains.
  10.380 -
  10.381 -
  10.382 -\subsection{Installing the Binaries}
  10.383 -
  10.384 -
  10.385 -The files produced by the build process are stored under the
  10.386 -\path{dist/install/} directory. To install them in their default
  10.387 -locations, do:
  10.388 -\begin{quote}
  10.389 -\begin{verbatim}
  10.390 -# make install
  10.391 -\end{verbatim} 
  10.392 -\end{quote}
  10.393 -
  10.394 -
  10.395 -Alternatively, users with special installation requirements may wish
  10.396 -to install them manually by copying the files to their appropriate
  10.397 -destinations.
  10.398 -
  10.399 -%% Files in \path{install/boot/} include:
  10.400 -%% \begin{itemize}
  10.401 -%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel'
  10.402 -%% \item \path{install/boot/vmlinuz-2.6-xen0}  Link to domain 0 XenLinux kernel
  10.403 -%% \item \path{install/boot/vmlinuz-2.6-xenU}  Link to unprivileged XenLinux kernel
  10.404 -%% \end{itemize}
  10.405 -
  10.406 -The \path{dist/install/boot} directory will also contain the config files
  10.407 -used for building the XenLinux kernels, and also versions of Xen and
  10.408 -XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6} and
  10.409 -\path{vmlinux-syms-2.6.11.11-xen0}) which are essential for interpreting crash
  10.410 -dumps.  Retain these files as the developers may wish to see them if
  10.411 -you post on the mailing list.
  10.412 -
  10.413 -
  10.414 -
  10.415 -
  10.416 -
  10.417 -\section{Configuration}
  10.418 -\label{s:configure}
  10.419 -Once you have built and installed the Xen distribution, it is 
  10.420 -simple to prepare the machine for booting and running Xen. 
  10.421 -
  10.422 -\subsection{GRUB Configuration}
  10.423 -
  10.424 -An entry should be added to \path{grub.conf} (often found under
  10.425 -\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot.
  10.426 -This file is sometimes called \path{menu.lst}, depending on your
  10.427 -distribution.  The entry should look something like the following:
  10.428 -
  10.429 -{\small
  10.430 -\begin{verbatim}
  10.431 -title Xen 2.0 / XenLinux 2.6
  10.432 -  kernel /boot/xen-2.0.gz dom0_mem=131072
  10.433 -  module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0
  10.434 -\end{verbatim}
  10.435 -}
  10.436 -
  10.437 -The kernel line tells GRUB where to find Xen itself and what boot
  10.438 -parameters should be passed to it (in this case, setting domain 0's
  10.439 -memory allocation in kilobytes and the settings for the serial port). For more
  10.440 -details on the various Xen boot parameters see Section~\ref{s:xboot}. 
  10.441 -
  10.442 -The module line of the configuration describes the location of the
  10.443 -XenLinux kernel that Xen should start and the parameters that should
  10.444 -be passed to it (these are standard Linux parameters, identifying the
  10.445 -root device and specifying it be initially mounted read only and
  10.446 -instructing that console output be sent to the screen).  Some
  10.447 -distributions such as SuSE do not require the \path{ro} parameter.
  10.448 -
  10.449 -%% \framebox{\parbox{5in}{
  10.450 -%% {\bf Distro specific:} \\
  10.451 -%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux kernel
  10.452 -%% command line, since the partition won't be remounted rw during boot.
  10.453 -%% }}
  10.454 -
  10.455 -
  10.456 -If you want to use an initrd, just add another \path{module} line to
  10.457 -the configuration, as usual:
  10.458 -{\small
  10.459 -\begin{verbatim}
  10.460 -  module /boot/my_initrd.gz
  10.461 -\end{verbatim}
  10.462 -}
  10.463 -
  10.464 -As always when installing a new kernel, it is recommended that you do
  10.465 -not delete existing menu options from \path{menu.lst} --- you may want
  10.466 -to boot your old Linux kernel in future, particularly if you
  10.467 -have problems.
  10.468 -
  10.469 -
  10.470 -\subsection{Serial Console (optional)}
  10.471 -
  10.472 -%%   kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1
  10.473 -%%   module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro 
  10.474 -
  10.475 -
  10.476 -In order to configure Xen serial console output, it is necessary to add 
  10.477 -an boot option to your GRUB config; e.g. replace the above kernel line 
  10.478 -with: 
  10.479 -\begin{quote}
  10.480 -{\small
  10.481 -\begin{verbatim}
  10.482 -   kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1
  10.483 -\end{verbatim}}
  10.484 -\end{quote}
  10.485 -
  10.486 -This configures Xen to output on COM1 at 115,200 baud, 8 data bits, 
  10.487 -1 stop bit and no parity. Modify these parameters for your set up. 
  10.488 -
  10.489 -One can also configure XenLinux to share the serial console; to 
  10.490 -achieve this append ``\path{console=ttyS0}'' to your 
  10.491 -module line. 
  10.492 -
  10.493 -
  10.494 -If you wish to be able to log in over the XenLinux serial console it
  10.495 -is necessary to add a line into \path{/etc/inittab}, just as per 
  10.496 -regular Linux. Simply add the line:
  10.497 -\begin{quote}
  10.498 -{\small 
  10.499 -{\tt c:2345:respawn:/sbin/mingetty ttyS0}
  10.500 -}
  10.501 -\end{quote} 
  10.502 -
  10.503 -and you should be able to log in. Note that to successfully log in 
  10.504 -as root over the serial line will require adding \path{ttyS0} to
  10.505 -\path{/etc/securetty} in most modern distributions. 
  10.506 -
  10.507 -\subsection{TLS Libraries}
  10.508 -
  10.509 -Users of the XenLinux 2.6 kernel should disable Thread Local Storage
  10.510 -(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before
  10.511 -attempting to run with a XenLinux kernel\footnote{If you boot without first
  10.512 -disabling TLS, you will get a warning message during the boot
  10.513 -process. In this case, simply perform the rename after the machine is
  10.514 -up and then run \texttt{/sbin/ldconfig} to make it take effect.}.  You can
  10.515 -always reenable it by restoring the directory to its original location
  10.516 -(i.e.\ \path{mv /lib/tls.disabled /lib/tls}).
  10.517 -
  10.518 -The reason for this is that the current TLS implementation uses
  10.519 -segmentation in a way that is not permissible under Xen.  If TLS is
  10.520 -not disabled, an emulation mode is used within Xen which reduces
  10.521 -performance substantially.
  10.522 -
  10.523 -We hope that this issue can be resolved by working with Linux
  10.524 -distribution vendors to implement a minor backward-compatible change
  10.525 -to the TLS library.
  10.526 -
  10.527 -\section{Booting Xen} 
  10.528 -
  10.529 -It should now be possible to restart the system and use Xen.  Reboot
  10.530 -as usual but choose the new Xen option when the Grub screen appears.
  10.531 -
  10.532 -What follows should look much like a conventional Linux boot.  The
  10.533 -first portion of the output comes from Xen itself, supplying low level
  10.534 -information about itself and the machine it is running on.  The
  10.535 -following portion of the output comes from XenLinux.
  10.536 -
  10.537 -You may see some errors during the XenLinux boot.  These are not
  10.538 -necessarily anything to worry about --- they may result from kernel
  10.539 -configuration differences between your XenLinux kernel and the one you
  10.540 -usually use.
  10.541 -
  10.542 -When the boot completes, you should be able to log into your system as
  10.543 -usual.  If you are unable to log in to your system running Xen, you
  10.544 -should still be able to reboot with your normal Linux kernel.
  10.545 -
  10.546 -
  10.547 -\chapter{Starting Additional Domains}
  10.548 -
  10.549 -The first step in creating a new domain is to prepare a root
  10.550 -filesystem for it to boot off.  Typically, this might be stored in a
  10.551 -normal partition, an LVM or other volume manager partition, a disk
  10.552 -file or on an NFS server.  A simple way to do this is simply to boot
  10.553 -from your standard OS install CD and install the distribution into
  10.554 -another partition on your hard drive.
  10.555 -
  10.556 -To start the \xend control daemon, type
  10.557 -\begin{quote}
  10.558 -\verb!# xend start!
  10.559 -\end{quote}
  10.560 -If you
  10.561 -wish the daemon to start automatically, see the instructions in
  10.562 -Section~\ref{s:xend}. Once the daemon is running, you can use the
  10.563 -\path{xm} tool to monitor and maintain the domains running on your
  10.564 -system. This chapter provides only a brief tutorial: we provide full
  10.565 -details of the \path{xm} tool in the next chapter. 
  10.566 -
  10.567 -%\section{From the web interface}
  10.568 -%
  10.569 -%Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv} for
  10.570 -%more details) using the command: \\
  10.571 -%\verb_# xensv start_ \\
  10.572 -%This will also start Xend (see Chapter~\ref{cha:xend} for more information).
  10.573 -%
  10.574 -%The domain management interface will then be available at {\tt
  10.575 -%http://your\_machine:8080/}.  This provides a user friendly wizard for
  10.576 -%starting domains and functions for managing running domains.
  10.577 -%
  10.578 -%\section{From the command line}
  10.579 -
  10.580 -
  10.581 -\section{Creating a Domain Configuration File} 
  10.582  
  10.583 -Before you can start an additional domain, you must create a
  10.584 -configuration file. We provide two example files which you 
  10.585 -can use as a starting point: 
  10.586 -\begin{itemize} 
  10.587 -  \item \path{/etc/xen/xmexample1} is a simple template configuration file
  10.588 -    for describing a single VM.
  10.589 -
  10.590 -  \item \path{/etc/xen/xmexample2} file is a template description that
  10.591 -    is intended to be reused for multiple virtual machines.  Setting
  10.592 -    the value of the \path{vmid} variable on the \path{xm} command line
  10.593 -    fills in parts of this template.
  10.594 -\end{itemize} 
  10.595 -
  10.596 -Copy one of these files and edit it as appropriate.
  10.597 -Typical values you may wish to edit include: 
  10.598 -
  10.599 -\begin{quote}
  10.600 -\begin{description}
  10.601 -\item[kernel] Set this to the path of the kernel you compiled for use
  10.602 -              with Xen (e.g.\  \path{kernel = '/boot/vmlinuz-2.6-xenU'})
  10.603 -\item[memory] Set this to the size of the domain's memory in
  10.604 -megabytes (e.g.\ \path{memory = 64})
  10.605 -\item[disk] Set the first entry in this list to calculate the offset
  10.606 -of the domain's root partition, based on the domain ID.  Set the
  10.607 -second to the location of \path{/usr} if you are sharing it between
  10.608 -domains (e.g.\ \path{disk = ['phy:your\_hard\_drive\%d,sda1,w' \%
  10.609 -(base\_partition\_number + vmid), 'phy:your\_usr\_partition,sda6,r' ]}
  10.610 -\item[dhcp] Uncomment the dhcp variable, so that the domain will
  10.611 -receive its IP address from a DHCP server (e.g.\ \path{dhcp='dhcp'})
  10.612 -\end{description}
  10.613 -\end{quote}
  10.614 -
  10.615 -You may also want to edit the {\bf vif} variable in order to choose
  10.616 -the MAC address of the virtual ethernet interface yourself.  For
  10.617 -example: 
  10.618 -\begin{quote}
  10.619 -\verb_vif = ['mac=00:06:AA:F6:BB:B3']_
  10.620 -\end{quote}
  10.621 -If you do not set this variable, \xend will automatically generate a
  10.622 -random MAC address from an unused range.
  10.623 -
  10.624 -
  10.625 -\section{Booting the Domain}
  10.626 -
  10.627 -The \path{xm} tool provides a variety of commands for managing domains.
  10.628 -Use the \path{create} command to start new domains. Assuming you've 
  10.629 -created a configuration file \path{myvmconf} based around
  10.630 -\path{/etc/xen/xmexample2}, to start a domain with virtual 
  10.631 -machine ID~1 you should type: 
  10.632 -
  10.633 -\begin{quote}
  10.634 -\begin{verbatim}
  10.635 -# xm create -c myvmconf vmid=1
  10.636 -\end{verbatim}
  10.637 -\end{quote}
  10.638 -
  10.639 -
  10.640 -The \path{-c} switch causes \path{xm} to turn into the domain's
  10.641 -console after creation.  The \path{vmid=1} sets the \path{vmid}
  10.642 -variable used in the \path{myvmconf} file. 
  10.643 -
  10.644 -
  10.645 -You should see the console boot messages from the new domain 
  10.646 -appearing in the terminal in which you typed the command, 
  10.647 -culminating in a login prompt. 
  10.648 -
  10.649 -
  10.650 -\section{Example: ttylinux}
  10.651 -
  10.652 -Ttylinux is a very small Linux distribution, designed to require very
  10.653 -few resources.  We will use it as a concrete example of how to start a
  10.654 -Xen domain.  Most users will probably want to install a full-featured
  10.655 -distribution once they have mastered the basics\footnote{ttylinux is
  10.656 -maintained by Pascal Schmidt. You can download source packages from
  10.657 -the distribution's home page: {\tt http://www.minimalinux.org/ttylinux/}}.
  10.658 -
  10.659 -\begin{enumerate}
  10.660 -\item Download and extract the ttylinux disk image from the Files
  10.661 -section of the project's SourceForge site (see 
  10.662 -\path{http://sf.net/projects/xen/}).
  10.663 -\item Create a configuration file like the following:
  10.664 -\begin{verbatim}
  10.665 -kernel = "/boot/vmlinuz-2.6-xenU"
  10.666 -memory = 64
  10.667 -name = "ttylinux"
  10.668 -nics = 1
  10.669 -ip = "1.2.3.4"
  10.670 -disk = ['file:/path/to/ttylinux/rootfs,sda1,w']
  10.671 -root = "/dev/sda1 ro"
  10.672 -\end{verbatim}
  10.673 -\item Now start the domain and connect to its console:
  10.674 -\begin{verbatim}
  10.675 -xm create configfile -c
  10.676 -\end{verbatim}
  10.677 -\item Login as root, password root.
  10.678 -\end{enumerate}
  10.679 -
  10.680 -
  10.681 -\section{Starting / Stopping Domains Automatically}
  10.682 -
  10.683 -It is possible to have certain domains start automatically at boot
  10.684 -time and to have dom0 wait for all running domains to shutdown before
  10.685 -it shuts down the system.
  10.686 -
  10.687 -To specify a domain is to start at boot-time, place its
  10.688 -configuration file (or a link to it) under \path{/etc/xen/auto/}.
  10.689 -
  10.690 -A Sys-V style init script for RedHat and LSB-compliant systems is
  10.691 -provided and will be automatically copied to \path{/etc/init.d/}
  10.692 -during install.  You can then enable it in the appropriate way for
  10.693 -your distribution.
  10.694 -
  10.695 -For instance, on RedHat:
  10.696 -
  10.697 -\begin{quote}
  10.698 -\verb_# chkconfig --add xendomains_
  10.699 -\end{quote}
  10.700 -
  10.701 -By default, this will start the boot-time domains in runlevels 3, 4
  10.702 -and 5.
  10.703 -
  10.704 -You can also use the \path{service} command to run this script
  10.705 -manually, e.g:
  10.706 -
  10.707 -\begin{quote}
  10.708 -\verb_# service xendomains start_
  10.709 -
  10.710 -Starts all the domains with config files under /etc/xen/auto/.
  10.711 -\end{quote}
  10.712 -
  10.713 -
  10.714 -\begin{quote}
  10.715 -\verb_# service xendomains stop_
  10.716 -
  10.717 -Shuts down ALL running Xen domains.
  10.718 -\end{quote}
  10.719 -
  10.720 -\chapter{Domain Management Tools}
  10.721 -
  10.722 -The previous chapter described a simple example of how to configure
  10.723 -and start a domain.  This chapter summarises the tools available to
  10.724 -manage running domains.
  10.725 -
  10.726 -\section{Command-line Management}
  10.727 -
  10.728 -Command line management tasks are also performed using the \path{xm}
  10.729 -tool.  For online help for the commands available, type:
  10.730 -\begin{quote}
  10.731 -\verb_# xm help_
  10.732 -\end{quote}
  10.733 -
  10.734 -You can also type \path{xm help $<$command$>$} for more information 
  10.735 -on a given command. 
  10.736 -
  10.737 -\subsection{Basic Management Commands}
  10.738 -
  10.739 -The most important \path{xm} commands are: 
  10.740 -\begin{quote}
  10.741 -\verb_# xm list_: Lists all domains running.\\
  10.742 -\verb_# xm consoles_ : Gives information about the domain consoles.\\
  10.743 -\verb_# xm console_: Opens a console to a domain (e.g.\
  10.744 -  \verb_# xm console myVM_
  10.745 -\end{quote}
  10.746 -
  10.747 -\subsection{\tt xm list}
  10.748 -
  10.749 -The output of \path{xm list} is in rows of the following format:
  10.750 -\begin{center}
  10.751 -{\tt name domid memory cpu state cputime console}
  10.752 -\end{center}
  10.753 -
  10.754 -\begin{quote}
  10.755 -\begin{description}
  10.756 -\item[name]  The descriptive name of the virtual machine.
  10.757 -\item[domid] The number of the domain ID this virtual machine is running in.
  10.758 -\item[memory] Memory size in megabytes.
  10.759 -\item[cpu]   The CPU this domain is running on.
  10.760 -\item[state] Domain state consists of 5 fields:
  10.761 -  \begin{description}
  10.762 -  \item[r] running
  10.763 -  \item[b] blocked
  10.764 -  \item[p] paused
  10.765 -  \item[s] shutdown
  10.766 -  \item[c] crashed
  10.767 -  \end{description}
  10.768 -\item[cputime] How much CPU time (in seconds) the domain has used so far.
  10.769 -\item[console] TCP port accepting connections to the domain's console.
  10.770 -\end{description}
  10.771 -\end{quote}
  10.772 -
  10.773 -The \path{xm list} command also supports a long output format when the
  10.774 -\path{-l} switch is used.  This outputs the fulls details of the
  10.775 -running domains in \xend's SXP configuration format.
  10.776 -
  10.777 -For example, suppose the system is running the ttylinux domain as
  10.778 -described earlier.  The list command should produce output somewhat
  10.779 -like the following:
  10.780 -\begin{verbatim}
  10.781 -# xm list
  10.782 -Name              Id  Mem(MB)  CPU  State  Time(s)  Console
  10.783 -Domain-0           0      251    0  r----    172.2        
  10.784 -ttylinux           5       63    0  -b---      3.0    9605
  10.785 -\end{verbatim}
  10.786 -
  10.787 -Here we can see the details for the ttylinux domain, as well as for
  10.788 -domain 0 (which, of course, is always running).  Note that the console
  10.789 -port for the ttylinux domain is 9605.  This can be connected to by TCP
  10.790 -using a terminal program (e.g. \path{telnet} or, better, 
  10.791 -\path{xencons}).  The simplest way to connect is to use the \path{xm console}
  10.792 -command, specifying the domain name or ID.  To connect to the console
  10.793 -of the ttylinux domain, we could use any of the following: 
  10.794 -\begin{verbatim}
  10.795 -# xm console ttylinux
  10.796 -# xm console 5
  10.797 -# xencons localhost 9605
  10.798 -\end{verbatim}
  10.799 -
  10.800 -\section{Domain Save and Restore}
  10.801 -
  10.802 -The administrator of a Xen system may suspend a virtual machine's
  10.803 -current state into a disk file in domain 0, allowing it to be resumed
  10.804 -at a later time.
  10.805 -
  10.806 -The ttylinux domain described earlier can be suspended to disk using
  10.807 -the command:
  10.808 -\begin{verbatim}
  10.809 -# xm save ttylinux ttylinux.xen
  10.810 -\end{verbatim}
  10.811 -
  10.812 -This will stop the domain named `ttylinux' and save its current state
  10.813 -into a file called \path{ttylinux.xen}.
  10.814 -
  10.815 -To resume execution of this domain, use the \path{xm restore} command:
  10.816 -\begin{verbatim}
  10.817 -# xm restore ttylinux.xen
  10.818 -\end{verbatim}
  10.819 -
  10.820 -This will restore the state of the domain and restart it.  The domain
  10.821 -will carry on as before and the console may be reconnected using the
  10.822 -\path{xm console} command, as above.
  10.823 -
  10.824 -\section{Live Migration}
  10.825 -
  10.826 -Live migration is used to transfer a domain between physical hosts
  10.827 -whilst that domain continues to perform its usual activities --- from
  10.828 -the user's perspective, the migration should be imperceptible.
  10.829 -
  10.830 -To perform a live migration, both hosts must be running Xen / \xend and
  10.831 -the destination host must have sufficient resources (e.g. memory
  10.832 -capacity) to accommodate the domain after the move. Furthermore we
  10.833 -currently require both source and destination machines to be on the 
  10.834 -same L2 subnet. 
  10.835 -
  10.836 -Currently, there is no support for providing automatic remote access
  10.837 -to filesystems stored on local disk when a domain is migrated.
  10.838 -Administrators should choose an appropriate storage solution
  10.839 -(i.e. SAN, NAS, etc.) to ensure that domain filesystems are also
  10.840 -available on their destination node. GNBD is a good method for
  10.841 -exporting a volume from one machine to another. iSCSI can do a similar
  10.842 -job, but is more complex to set up.
  10.843 -
  10.844 -When a domain migrates, it's MAC and IP address move with it, thus it
  10.845 -is only possible to migrate VMs within the same layer-2 network and IP
  10.846 -subnet. If the destination node is on a different subnet, the
  10.847 -administrator would need to manually configure a suitable etherip or
  10.848 -IP tunnel in the domain 0 of the remote node. 
  10.849 -
  10.850 -A domain may be migrated using the \path{xm migrate} command.  To
  10.851 -live migrate a domain to another machine, we would use
  10.852 -the command:
  10.853 -
  10.854 -\begin{verbatim}
  10.855 -# xm migrate --live mydomain destination.ournetwork.com
  10.856 -\end{verbatim}
  10.857 -
  10.858 -Without the \path{--live} flag, \xend simply stops the domain and
  10.859 -copies the memory image over to the new node and restarts it. Since
  10.860 -domains can have large allocations this can be quite time consuming,
  10.861 -even on a Gigabit network. With the \path{--live} flag \xend attempts
  10.862 -to keep the domain running while the migration is in progress,
  10.863 -resulting in typical `downtimes' of just 60--300ms.
  10.864 -
  10.865 -For now it will be necessary to reconnect to the domain's console on
  10.866 -the new machine using the \path{xm console} command.  If a migrated
  10.867 -domain has any open network connections then they will be preserved,
  10.868 -so SSH connections do not have this limitation.
  10.869 -
  10.870 -\section{Managing Domain Memory}
  10.871 -
  10.872 -XenLinux domains have the ability to relinquish / reclaim machine
  10.873 -memory at the request of the administrator or the user of the domain.
  10.874 +\part{Introduction and Tutorial}
  10.875  
  10.876 -\subsection{Setting memory footprints from dom0}
  10.877 -
  10.878 -The machine administrator can request that a domain alter its memory
  10.879 -footprint using the \path{xm set-mem} command.  For instance, we can
  10.880 -request that our example ttylinux domain reduce its memory footprint
  10.881 -to 32 megabytes.
  10.882 -
  10.883 -\begin{verbatim}
  10.884 -# xm set-mem ttylinux 32
  10.885 -\end{verbatim}
  10.886 -
  10.887 -We can now see the result of this in the output of \path{xm list}:
  10.888 -
  10.889 -\begin{verbatim}
  10.890 -# xm list
  10.891 -Name              Id  Mem(MB)  CPU  State  Time(s)  Console
  10.892 -Domain-0           0      251    0  r----    172.2        
  10.893 -ttylinux           5       31    0  -b---      4.3    9605
  10.894 -\end{verbatim}
  10.895 -
  10.896 -The domain has responded to the request by returning memory to Xen. We
  10.897 -can restore the domain to its original size using the command line:
  10.898 -
  10.899 -\begin{verbatim}
  10.900 -# xm set-mem ttylinux 64
  10.901 -\end{verbatim}
  10.902 -
  10.903 -\subsection{Setting memory footprints from within a domain}
  10.904 -
  10.905 -The virtual file \path{/proc/xen/balloon} allows the owner of a
  10.906 -domain to adjust their own memory footprint.  Reading the file
  10.907 -(e.g. \path{cat /proc/xen/balloon}) prints out the current
  10.908 -memory footprint of the domain.  Writing the file
  10.909 -(e.g. \path{echo new\_target > /proc/xen/balloon}) requests
  10.910 -that the kernel adjust the domain's memory footprint to a new value.
  10.911 -
  10.912 -\subsection{Setting memory limits}
  10.913 -
  10.914 -Xen associates a memory size limit with each domain.  By default, this
  10.915 -is the amount of memory the domain is originally started with,
  10.916 -preventing the domain from ever growing beyond this size.  To permit a
  10.917 -domain to grow beyond its original allocation or to prevent a domain
  10.918 -you've shrunk from reclaiming the memory it relinquished, use the 
  10.919 -\path{xm maxmem} command.
  10.920 -
  10.921 -\chapter{Domain Filesystem Storage}
  10.922 -
  10.923 -It is possible to directly export any Linux block device in dom0 to
  10.924 -another domain, or to export filesystems / devices to virtual machines
  10.925 -using standard network protocols (e.g. NBD, iSCSI, NFS, etc).  This
  10.926 -chapter covers some of the possibilities.
  10.927 -
  10.928 -
  10.929 -\section{Exporting Physical Devices as VBDs} 
  10.930 -\label{s:exporting-physical-devices-as-vbds}
  10.931 -
  10.932 -One of the simplest configurations is to directly export 
  10.933 -individual partitions from domain 0 to other domains. To 
  10.934 -achieve this use the \path{phy:} specifier in your domain 
  10.935 -configuration file. For example a line like
  10.936 -\begin{quote}
  10.937 -\verb_disk = ['phy:hda3,sda1,w']_
  10.938 -\end{quote}
  10.939 -specifies that the partition \path{/dev/hda3} in domain 0 
  10.940 -should be exported read-write to the new domain as \path{/dev/sda1}; 
  10.941 -one could equally well export it as \path{/dev/hda} or 
  10.942 -\path{/dev/sdb5} should one wish. 
  10.943 -
  10.944 -In addition to local disks and partitions, it is possible to export
  10.945 -any device that Linux considers to be ``a disk'' in the same manner.
  10.946 -For example, if you have iSCSI disks or GNBD volumes imported into
  10.947 -domain 0 you can export these to other domains using the \path{phy:}
  10.948 -disk syntax. E.g.:
  10.949 -\begin{quote}
  10.950 -\verb_disk = ['phy:vg/lvm1,sda2,w']_
  10.951 -\end{quote}
  10.952 -
  10.953 -
  10.954 -
  10.955 -\begin{center}
  10.956 -\framebox{\bf Warning: Block device sharing}
  10.957 -\end{center}
  10.958 -\begin{quote}
  10.959 -Block devices should typically only be shared between domains in a
  10.960 -read-only fashion otherwise the Linux kernel's file systems will get
  10.961 -very confused as the file system structure may change underneath them
  10.962 -(having the same ext3 partition mounted rw twice is a sure fire way to
  10.963 -cause irreparable damage)!  \Xend will attempt to prevent you from
  10.964 -doing this by checking that the device is not mounted read-write in
  10.965 -domain 0, and hasn't already been exported read-write to another
  10.966 -domain.
  10.967 -If you want read-write sharing, export the directory to other domains
  10.968 -via NFS from domain0 (or use a cluster file system such as GFS or
  10.969 -ocfs2).
  10.970 -
  10.971 -\end{quote}
  10.972 -
  10.973 -
  10.974 -\section{Using File-backed VBDs}
  10.975 -
  10.976 -It is also possible to use a file in Domain 0 as the primary storage
  10.977 -for a virtual machine.  As well as being convenient, this also has the
  10.978 -advantage that the virtual block device will be {\em sparse} --- space
  10.979 -will only really be allocated as parts of the file are used.  So if a
  10.980 -virtual machine uses only half of its disk space then the file really
  10.981 -takes up half of the size allocated.
  10.982 -
  10.983 -For example, to create a 2GB sparse file-backed virtual block device
  10.984 -(actually only consumes 1KB of disk):
  10.985 -\begin{quote}
  10.986 -\verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_
  10.987 -\end{quote}
  10.988 -
  10.989 -Make a file system in the disk file: 
  10.990 -\begin{quote}
  10.991 -\verb_# mkfs -t ext3 vm1disk_
  10.992 -\end{quote}
  10.993 -
  10.994 -(when the tool asks for confirmation, answer `y')
  10.995 -
  10.996 -Populate the file system e.g. by copying from the current root:
  10.997 -\begin{quote}
  10.998 -\begin{verbatim}
  10.999 -# mount -o loop vm1disk /mnt
 10.1000 -# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt
 10.1001 -# mkdir /mnt/{proc,sys,home,tmp}
 10.1002 -\end{verbatim}
 10.1003 -\end{quote}
 10.1004 -
 10.1005 -Tailor the file system by editing \path{/etc/fstab},
 10.1006 -\path{/etc/hostname}, etc (don't forget to edit the files in the
 10.1007 -mounted file system, instead of your domain 0 filesystem, e.g. you
 10.1008 -would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab} ).  For
 10.1009 -this example put \path{/dev/sda1} to root in fstab.
 10.1010 -
 10.1011 -Now unmount (this is important!):
 10.1012 -\begin{quote}
 10.1013 -\verb_# umount /mnt_
 10.1014 -\end{quote}
 10.1015 -
 10.1016 -In the configuration file set:
 10.1017 -\begin{quote}
 10.1018 -\verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_
 10.1019 -\end{quote}
 10.1020 +%% Chapter Introduction moved to introduction.tex
 10.1021 +\include{src/user/introduction}
 10.1022  
 10.1023 -As the virtual machine writes to its `disk', the sparse file will be
 10.1024 -filled in and consume more space up to the original 2GB.
 10.1025 -
 10.1026 -{\bf Note that file-backed VBDs may not be appropriate for backing
 10.1027 -I/O-intensive domains.}  File-backed VBDs are known to experience
 10.1028 -substantial slowdowns under heavy I/O workloads, due to the I/O handling
 10.1029 -by the loopback block device used to support file-backed VBDs in dom0.
 10.1030 -Better I/O performance can be achieved by using either LVM-backed VBDs
 10.1031 -(Section~\ref{s:using-lvm-backed-vbds}) or physical devices as VBDs
 10.1032 -(Section~\ref{s:exporting-physical-devices-as-vbds}).
 10.1033 -
 10.1034 -Linux supports a maximum of eight file-backed VBDs across all domains by
 10.1035 -default.  This limit can be statically increased by using the {\em
 10.1036 -max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is compiled as a
 10.1037 -module in the dom0 kernel, or by using the {\em max\_loop=n} boot option
 10.1038 -if CONFIG\_BLK\_DEV\_LOOP is compiled directly into the dom0 kernel.
 10.1039 -
 10.1040 -
 10.1041 -\section{Using LVM-backed VBDs}
 10.1042 -\label{s:using-lvm-backed-vbds}
 10.1043 -
 10.1044 -A particularly appealing solution is to use LVM volumes 
 10.1045 -as backing for domain file-systems since this allows dynamic
 10.1046 -growing/shrinking of volumes as well as snapshot and other 
 10.1047 -features. 
 10.1048 -
 10.1049 -To initialise a partition to support LVM volumes:
 10.1050 -\begin{quote}
 10.1051 -\begin{verbatim} 
 10.1052 -# pvcreate /dev/sda10           
 10.1053 -\end{verbatim} 
 10.1054 -\end{quote}
 10.1055 -
 10.1056 -Create a volume group named `vg' on the physical partition:
 10.1057 -\begin{quote}
 10.1058 -\begin{verbatim} 
 10.1059 -# vgcreate vg /dev/sda10
 10.1060 -\end{verbatim} 
 10.1061 -\end{quote}
 10.1062 -
 10.1063 -Create a logical volume of size 4GB named `myvmdisk1':
 10.1064 -\begin{quote}
 10.1065 -\begin{verbatim} 
 10.1066 -# lvcreate -L4096M -n myvmdisk1 vg
 10.1067 -\end{verbatim} 
 10.1068 -\end{quote}
 10.1069 -
 10.1070 -You should now see that you have a \path{/dev/vg/myvmdisk1}
 10.1071 -Make a filesystem, mount it and populate it, e.g.:
 10.1072 -\begin{quote}
 10.1073 -\begin{verbatim} 
 10.1074 -# mkfs -t ext3 /dev/vg/myvmdisk1
 10.1075 -# mount /dev/vg/myvmdisk1 /mnt
 10.1076 -# cp -ax / /mnt
 10.1077 -# umount /mnt
 10.1078 -\end{verbatim} 
 10.1079 -\end{quote}
 10.1080 -
 10.1081 -Now configure your VM with the following disk configuration:
 10.1082 -\begin{quote}
 10.1083 -\begin{verbatim} 
 10.1084 - disk = [ 'phy:vg/myvmdisk1,sda1,w' ]
 10.1085 -\end{verbatim} 
 10.1086 -\end{quote}
 10.1087 -
 10.1088 -LVM enables you to grow the size of logical volumes, but you'll need
 10.1089 -to resize the corresponding file system to make use of the new
 10.1090 -space. Some file systems (e.g. ext3) now support on-line resize.  See
 10.1091 -the LVM manuals for more details.
 10.1092 +%% Chapter Installation moved to installation.tex
 10.1093 +\include{src/user/installation}
 10.1094  
 10.1095 -You can also use LVM for creating copy-on-write clones of LVM
 10.1096 -volumes (known as writable persistent snapshots in LVM
 10.1097 -terminology). This facility is new in Linux 2.6.8, so isn't as
 10.1098 -stable as one might hope. In particular, using lots of CoW LVM
 10.1099 -disks consumes a lot of dom0 memory, and error conditions such as
 10.1100 -running out of disk space are not handled well. Hopefully this
 10.1101 -will improve in future.
 10.1102 -
 10.1103 -To create two copy-on-write clone of the above file system you
 10.1104 -would use the following commands:
 10.1105 -
 10.1106 -\begin{quote}
 10.1107 -\begin{verbatim} 
 10.1108 -# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1
 10.1109 -# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1
 10.1110 -\end{verbatim} 
 10.1111 -\end{quote}
 10.1112 -
 10.1113 -Each of these can grow to have 1GB of differences from the master
 10.1114 -volume. You can grow the amount of space for storing the
 10.1115 -differences using the lvextend command, e.g.:
 10.1116 -\begin{quote}
 10.1117 -\begin{verbatim} 
 10.1118 -# lvextend +100M /dev/vg/myclonedisk1
 10.1119 -\end{verbatim} 
 10.1120 -\end{quote}
 10.1121 -
 10.1122 -Don't let the `differences volume' ever fill up otherwise LVM gets
 10.1123 -rather confused. It may be possible to automate the growing
 10.1124 -process by using \path{dmsetup wait} to spot the volume getting full
 10.1125 -and then issue an \path{lvextend}.
 10.1126 -
 10.1127 -In principle, it is possible to continue writing to the volume
 10.1128 -that has been cloned (the changes will not be visible to the
 10.1129 -clones), but we wouldn't recommend this: have the cloned volume
 10.1130 -as a `pristine' file system install that isn't mounted directly
 10.1131 -by any of the virtual machines.
 10.1132 -
 10.1133 +%% Chapter Starting Additional Domains  moved to start_addl_dom.tex
 10.1134 +\include{src/user/start_addl_dom}
 10.1135  
 10.1136 -\section{Using NFS Root}
 10.1137 -
 10.1138 -First, populate a root filesystem in a directory on the server
 10.1139 -machine. This can be on a distinct physical machine, or simply 
 10.1140 -run within a virtual machine on the same node.
 10.1141 -
 10.1142 -Now configure the NFS server to export this filesystem over the
 10.1143 -network by adding a line to \path{/etc/exports}, for instance:
 10.1144 -
 10.1145 -\begin{quote}
 10.1146 -\begin{small}
 10.1147 -\begin{verbatim}
 10.1148 -/export/vm1root      1.2.3.4/24 (rw,sync,no_root_squash)
 10.1149 -\end{verbatim}
 10.1150 -\end{small}
 10.1151 -\end{quote}
 10.1152 +%% Chapter Domain Management Tools moved to domain_mgmt.tex
 10.1153 +\include{src/user/domain_mgmt}
 10.1154  
 10.1155 -Finally, configure the domain to use NFS root.  In addition to the
 10.1156 -normal variables, you should make sure to set the following values in
 10.1157 -the domain's configuration file:
 10.1158 +%% Chapter Domain Filesystem Storage moved to domain_filesystem.tex
 10.1159 +\include{src/user/domain_filesystem}
 10.1160  
 10.1161 -\begin{quote}
 10.1162 -\begin{small}
 10.1163 -\begin{verbatim}
 10.1164 -root       = '/dev/nfs'
 10.1165 -nfs_server = '2.3.4.5'       # substitute IP address of server 
 10.1166 -nfs_root   = '/path/to/root' # path to root FS on the server
 10.1167 -\end{verbatim}
 10.1168 -\end{small}
 10.1169 -\end{quote}
 10.1170 -
 10.1171 -The domain will need network access at boot time, so either statically
 10.1172 -configure an IP address (Using the config variables \path{ip}, 
 10.1173 -\path{netmask}, \path{gateway}, \path{hostname}) or enable DHCP (
 10.1174 -\path{dhcp='dhcp'}).
 10.1175 -
 10.1176 -Note that the Linux NFS root implementation is known to have stability
 10.1177 -problems under high load (this is not a Xen-specific problem), so this
 10.1178 -configuration may not be appropriate for critical servers.
 10.1179  
 10.1180  
 10.1181  \part{User Reference Documentation}
 10.1182  
 10.1183 -\chapter{Control Software} 
 10.1184 -
 10.1185 -The Xen control software includes the \xend node control daemon (which 
 10.1186 -must be running), the xm command line tools, and the prototype 
 10.1187 -xensv web interface. 
 10.1188 -
 10.1189 -\section{\Xend (node control daemon)}
 10.1190 -\label{s:xend}
 10.1191 -
 10.1192 -The Xen Daemon (\Xend) performs system management functions related to
 10.1193 -virtual machines.  It forms a central point of control for a machine
 10.1194 -and can be controlled using an HTTP-based protocol.  \Xend must be
 10.1195 -running in order to start and manage virtual machines.
 10.1196 -
 10.1197 -\Xend must be run as root because it needs access to privileged system
 10.1198 -management functions.  A small set of commands may be issued on the
 10.1199 -\xend command line:
 10.1200 -
 10.1201 -\begin{tabular}{ll}
 10.1202 -\verb!# xend start! & start \xend, if not already running \\
 10.1203 -\verb!# xend stop!  & stop \xend if already running       \\
 10.1204 -\verb!# xend restart! & restart \xend if running, otherwise start it \\
 10.1205 -% \verb!# xend trace_start! & start \xend, with very detailed debug logging \\
 10.1206 -\verb!# xend status! & indicates \xend status by its return code
 10.1207 -\end{tabular}
 10.1208 -
 10.1209 -A SysV init script called {\tt xend} is provided to start \xend at boot
 10.1210 -time.  {\tt make install} installs this script in {\path{/etc/init.d}.
 10.1211 -To enable it, you have to make symbolic links in the appropriate
 10.1212 -runlevel directories or use the {\tt chkconfig} tool, where available.
 10.1213 -
 10.1214 -Once \xend is running, more sophisticated administration can be done
 10.1215 -using the xm tool (see Section~\ref{s:xm}) and the experimental
 10.1216 -Xensv web interface (see Section~\ref{s:xensv}).
 10.1217 -
 10.1218 -As \xend runs, events will be logged to \path{/var/log/xend.log} and, 
 10.1219 -if the migration assistant daemon (\path{xfrd}) has been started, 
 10.1220 -\path{/var/log/xfrd.log}. These may be of use for troubleshooting
 10.1221 -problems.
 10.1222 -
 10.1223 -\section{Xm (command line interface)}
 10.1224 -\label{s:xm}
 10.1225 -
 10.1226 -The xm tool is the primary tool for managing Xen from the console.
 10.1227 -The general format of an xm command line is:
 10.1228 -
 10.1229 -\begin{verbatim}
 10.1230 -# xm command [switches] [arguments] [variables]
 10.1231 -\end{verbatim}
 10.1232 -
 10.1233 -The available {\em switches} and {\em arguments} are dependent on the
 10.1234 -{\em command} chosen.  The {\em variables} may be set using
 10.1235 -declarations of the form {\tt variable=value} and command line
 10.1236 -declarations override any of the values in the configuration file
 10.1237 -being used, including the standard variables described above and any
 10.1238 -custom variables (for instance, the \path{xmdefconfig} file uses a
 10.1239 -{\tt vmid} variable).
 10.1240 -
 10.1241 -The available commands are as follows:
 10.1242 -
 10.1243 -\begin{description}
 10.1244 -\item[set-mem] Request a domain to adjust its memory footprint.
 10.1245 -\item[create] Create a new domain.
 10.1246 -\item[destroy] Kill a domain immediately.
 10.1247 -\item[list] List running domains.
 10.1248 -\item[shutdown] Ask a domain to shutdown.
 10.1249 -\item[dmesg] Fetch the Xen (not Linux!) boot output.
 10.1250 -\item[consoles] Lists the available consoles.
 10.1251 -\item[console] Connect to the console for a domain.
 10.1252 -\item[help] Get help on xm commands.
 10.1253 -\item[save] Suspend a domain to disk.
 10.1254 -\item[restore] Restore a domain from disk.
 10.1255 -\item[pause] Pause a domain's execution.
 10.1256 -\item[unpause] Unpause a domain.
 10.1257 -\item[pincpu] Pin a domain to a CPU.
 10.1258 -\item[bvt] Set BVT scheduler parameters for a domain.
 10.1259 -\item[bvt\_ctxallow] Set the BVT context switching allowance for the system.
 10.1260 -\item[atropos] Set the atropos parameters for a domain.
 10.1261 -\item[rrobin] Set the round robin time slice for the system.
 10.1262 -\item[info] Get information about the Xen host.
 10.1263 -\item[call] Call a \xend HTTP API function directly.
 10.1264 -\end{description}
 10.1265 -
 10.1266 -For a detailed overview of switches, arguments and variables to each command
 10.1267 -try
 10.1268 -\begin{quote}
 10.1269 -\begin{verbatim}
 10.1270 -# xm help command
 10.1271 -\end{verbatim}
 10.1272 -\end{quote}
 10.1273 -
 10.1274 -\section{Xensv (web control interface)}
 10.1275 -\label{s:xensv}
 10.1276 -
 10.1277 -Xensv is the experimental web control interface for managing a Xen
 10.1278 -machine.  It can be used to perform some (but not yet all) of the
 10.1279 -management tasks that can be done using the xm tool.
 10.1280 -
 10.1281 -It can be started using:
 10.1282 -\begin{quote}
 10.1283 -\verb_# xensv start_
 10.1284 -\end{quote}
 10.1285 -and stopped using: 
 10.1286 -\begin{quote}
 10.1287 -\verb_# xensv stop_
 10.1288 -\end{quote}
 10.1289 -
 10.1290 -By default, Xensv will serve out the web interface on port 8080.  This
 10.1291 -can be changed by editing 
 10.1292 -\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}.
 10.1293 -
 10.1294 -Once Xensv is running, the web interface can be used to create and
 10.1295 -manage running domains.
 10.1296 -
 10.1297 -
 10.1298 -
 10.1299 -
 10.1300 -\chapter{Domain Configuration}
 10.1301 -\label{cha:config}
 10.1302 -
 10.1303 -The following contains the syntax of the domain configuration 
 10.1304 -files and description of how to further specify networking, 
 10.1305 -driver domain and general scheduling behaviour. 
 10.1306 -
 10.1307 -\section{Configuration Files}
 10.1308 -\label{s:cfiles}
 10.1309 -
 10.1310 -Xen configuration files contain the following standard variables.
 10.1311 -Unless otherwise stated, configuration items should be enclosed in
 10.1312 -quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2} 
 10.1313 -for concrete examples of the syntax.
 10.1314 -
 10.1315 -\begin{description}
 10.1316 -\item[kernel] Path to the kernel image 
 10.1317 -\item[ramdisk] Path to a ramdisk image (optional).
 10.1318 -% \item[builder] The name of the domain build function (e.g. {\tt'linux'} or {\tt'netbsd'}.
 10.1319 -\item[memory] Memory size in megabytes.
 10.1320 -\item[cpu] CPU to run this domain on, or {\tt -1} for
 10.1321 -  auto-allocation. 
 10.1322 -\item[console] Port to export the domain console on (default 9600 + domain ID).
 10.1323 -\item[nics] Number of virtual network interfaces.
 10.1324 -\item[vif] List of MAC addresses (random addresses are assigned if not
 10.1325 -  given) and bridges to use for the domain's network interfaces, e.g.
 10.1326 -\begin{verbatim}
 10.1327 -vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0',
 10.1328 -        'bridge=xen-br1' ]
 10.1329 -\end{verbatim}
 10.1330 -  to assign a MAC address and bridge to the first interface and assign
 10.1331 -  a different bridge to the second interface, leaving \xend to choose
 10.1332 -  the MAC address.
 10.1333 -\item[disk] List of block devices to export to the domain,  e.g. \\
 10.1334 -  \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\
 10.1335 -  exports physical device \path{/dev/hda1} to the domain 
 10.1336 -  as \path{/dev/sda1} with read-only access. Exporting a disk read-write 
 10.1337 -  which is currently mounted is dangerous -- if you are \emph{certain}
 10.1338 -  you wish to do this, you can specify \path{w!} as the mode. 
 10.1339 -\item[dhcp] Set to {\tt 'dhcp'} if you want to use DHCP to configure
 10.1340 -  networking. 
 10.1341 -\item[netmask] Manually configured IP netmask.
 10.1342 -\item[gateway] Manually configured IP gateway. 
 10.1343 -\item[hostname] Set the hostname for the virtual machine.
 10.1344 -\item[root] Specify the root device parameter on the kernel command
 10.1345 -  line. 
 10.1346 -\item[nfs\_server] IP address for the NFS server (if any). 
 10.1347 -\item[nfs\_root] Path of the root filesystem on the NFS server (if any).
 10.1348 -\item[extra] Extra string to append to the kernel command line (if
 10.1349 -  any) 
 10.1350 -\item[restart] Three possible options:
 10.1351 -  \begin{description}
 10.1352 -  \item[always] Always restart the domain, no matter what
 10.1353 -                its exit code is.
 10.1354 -  \item[never]  Never restart the domain.
 10.1355 -  \item[onreboot] Restart the domain iff it requests reboot.
 10.1356 -  \end{description}
 10.1357 -\end{description}
 10.1358 -
 10.1359 -For additional flexibility, it is also possible to include Python
 10.1360 -scripting commands in configuration files.  An example of this is the
 10.1361 -\path{xmexample2} file, which uses Python code to handle the 
 10.1362 -\path{vmid} variable.
 10.1363 -
 10.1364 -
 10.1365 -%\part{Advanced Topics}
 10.1366 -
 10.1367 -\section{Network Configuration}
 10.1368 -
 10.1369 -For many users, the default installation should work `out of the box'.
 10.1370 -More complicated network setups, for instance with multiple ethernet
 10.1371 -interfaces and/or existing bridging setups will require some
 10.1372 -special configuration.
 10.1373 -
 10.1374 -The purpose of this section is to describe the mechanisms provided by
 10.1375 -\xend to allow a flexible configuration for Xen's virtual networking.
 10.1376 -
 10.1377 -\subsection{Xen virtual network topology}
 10.1378 -
 10.1379 -Each domain network interface is connected to a virtual network
 10.1380 -interface in dom0 by a point to point link (effectively a `virtual
 10.1381 -crossover cable').  These devices are named {\tt
 10.1382 -vif$<$domid$>$.$<$vifid$>$} (e.g. {\tt vif1.0} for the first interface
 10.1383 -in domain 1, {\tt vif3.1} for the second interface in domain 3).
 10.1384 -
 10.1385 -Traffic on these virtual interfaces is handled in domain 0 using
 10.1386 -standard Linux mechanisms for bridging, routing, rate limiting, etc.
 10.1387 -Xend calls on two shell scripts to perform initial configuration of
 10.1388 -the network and configuration of new virtual interfaces.  By default,
 10.1389 -these scripts configure a single bridge for all the virtual
 10.1390 -interfaces.  Arbitrary routing / bridging configurations can be
 10.1391 -configured by customising the scripts, as described in the following
 10.1392 -section.
 10.1393 -
 10.1394 -\subsection{Xen networking scripts}
 10.1395 -
 10.1396 -Xen's virtual networking is configured by two shell scripts (by
 10.1397 -default \path{network} and \path{vif-bridge}).  These are
 10.1398 -called automatically by \xend when certain events occur, with
 10.1399 -arguments to the scripts providing further contextual information.
 10.1400 -These scripts are found by default in \path{/etc/xen/scripts}.  The
 10.1401 -names and locations of the scripts can be configured in
 10.1402 -\path{/etc/xen/xend-config.sxp}.
 10.1403 -
 10.1404 -\begin{description} 
 10.1405 -
 10.1406 -\item[network:] This script is called whenever \xend is started or
 10.1407 -stopped to respectively initialise or tear down the Xen virtual
 10.1408 -network. In the default configuration initialisation creates the
 10.1409 -bridge `xen-br0' and moves eth0 onto that bridge, modifying the
 10.1410 -routing accordingly. When \xend exits, it deletes the Xen bridge and
 10.1411 -removes eth0, restoring the normal IP and routing configuration.
 10.1412 -
 10.1413 -%% In configurations where the bridge already exists, this script could
 10.1414 -%% be replaced with a link to \path{/bin/true} (for instance).
 10.1415 -
 10.1416 -\item[vif-bridge:] This script is called for every domain virtual
 10.1417 -interface and can configure firewalling rules and add the vif 
 10.1418 -to the appropriate bridge. By default, this adds and removes 
 10.1419 -VIFs on the default Xen bridge.
 10.1420 -
 10.1421 -\end{description} 
 10.1422 -
 10.1423 -For more complex network setups (e.g. where routing is required or
 10.1424 -integrate with existing bridges) these scripts may be replaced with
 10.1425 -customised variants for your site's preferred configuration.
 10.1426 -
 10.1427 -%% There are two possible types of privileges:  IO privileges and
 10.1428 -%% administration privileges.
 10.1429 -
 10.1430 -\section{Driver Domain Configuration} 
 10.1431 -
 10.1432 -I/O privileges can be assigned to allow a domain to directly access
 10.1433 -PCI devices itself.  This is used to support driver domains.
 10.1434 -
 10.1435 -Setting backend privileges is currently only supported in SXP format
 10.1436 -config files.  To allow a domain to function as a backend for others,
 10.1437 -somewhere within the {\tt vm} element of its configuration file must
 10.1438 -be a {\tt backend} element of the form {\tt (backend ({\em type}))}
 10.1439 -where {\tt \em type} may be either {\tt netif} or {\tt blkif},
 10.1440 -according to the type of virtual device this domain will service.
 10.1441 -%% After this domain has been built, \xend will connect all new and
 10.1442 -%% existing {\em virtual} devices (of the appropriate type) to that
 10.1443 -%% backend.
 10.1444 -
 10.1445 -Note that a block backend cannot currently import virtual block
 10.1446 -devices from other domains, and a network backend cannot import
 10.1447 -virtual network devices from other domains.  Thus (particularly in the
 10.1448 -case of block backends, which cannot import a virtual block device as
 10.1449 -their root filesystem), you may need to boot a backend domain from a
 10.1450 -ramdisk or a network device.
 10.1451 -
 10.1452 -Access to PCI devices may be configured on a per-device basis.  Xen
 10.1453 -will assign the minimal set of hardware privileges to a domain that
 10.1454 -are required to control its devices.  This can be configured in either
 10.1455 -format of configuration file:
 10.1456 -
 10.1457 -\begin{itemize}
 10.1458 -\item SXP Format: Include device elements of the form: \\
 10.1459 -\centerline{  {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\
 10.1460 -  inside the top-level {\tt vm} element.  Each one specifies the address
 10.1461 -  of a device this domain is allowed to access ---
 10.1462 -  the numbers {\em x},{\em y} and {\em z} may be in either decimal or
 10.1463 -  hexadecimal format.
 10.1464 -\item Flat Format: Include a list of PCI device addresses of the
 10.1465 -  format: \\ 
 10.1466 -\centerline{{\tt pci = ['x,y,z', ...]}} \\ 
 10.1467 -where each element in the
 10.1468 -  list is a string specifying the components of the PCI device
 10.1469 -  address, separated by commas.  The components ({\tt \em x}, {\tt \em
 10.1470 -  y} and {\tt \em z}) of the list may be formatted as either decimal
 10.1471 -  or hexadecimal.
 10.1472 -\end{itemize}
 10.1473 -
 10.1474 -%% \section{Administration Domains}
 10.1475 -
 10.1476 -%% Administration privileges allow a domain to use the `dom0
 10.1477 -%% operations' (so called because they are usually available only to
 10.1478 -%% domain 0).  A privileged domain can build other domains, set scheduling
 10.1479 -%% parameters, etc.
 10.1480 -
 10.1481 -% Support for other administrative domains is not yet available...  perhaps
 10.1482 -% we should plumb it in some time
 10.1483 -
 10.1484 -
 10.1485 -
 10.1486 -
 10.1487 -
 10.1488 -\section{Scheduler Configuration}
 10.1489 -\label{s:sched} 
 10.1490 -
 10.1491 -
 10.1492 -Xen offers a boot time choice between multiple schedulers.  To select
 10.1493 -a scheduler, pass the boot parameter {\em sched=sched\_name} to Xen,
 10.1494 -substituting the appropriate scheduler name.  Details of the schedulers
 10.1495 -and their parameters are included below; future versions of the tools
 10.1496 -will provide a higher-level interface to these tools.
 10.1497 +%% Chapter Control Software moved to control_software.tex
 10.1498 +\include{src/user/control_software}
 10.1499  
 10.1500 -It is expected that system administrators configure their system to
 10.1501 -use the scheduler most appropriate to their needs.  Currently, the BVT
 10.1502 -scheduler is the recommended choice. 
 10.1503 -
 10.1504 -\subsection{Borrowed Virtual Time}
 10.1505 -
 10.1506 -{\tt sched=bvt} (the default) \\ 
 10.1507 -
 10.1508 -BVT provides proportional fair shares of the CPU time.  It has been
 10.1509 -observed to penalise domains that block frequently (e.g. I/O intensive
 10.1510 -domains), but this can be compensated for by using warping. 
 10.1511 -
 10.1512 -\subsubsection{Global Parameters}
 10.1513 -
 10.1514 -\begin{description}
 10.1515 -\item[ctx\_allow]
 10.1516 -  the context switch allowance is similar to the `quantum'
 10.1517 -  in traditional schedulers.  It is the minimum time that
 10.1518 -  a scheduled domain will be allowed to run before being
 10.1519 -  pre-empted. 
 10.1520 -\end{description}
 10.1521 -
 10.1522 -\subsubsection{Per-domain parameters}
 10.1523 -
 10.1524 -\begin{description}
 10.1525 -\item[mcuadv]
 10.1526 -  the MCU (Minimum Charging Unit) advance determines the
 10.1527 -  proportional share of the CPU that a domain receives.  It
 10.1528 -  is set inversely proportionally to a domain's sharing weight.
 10.1529 -\item[warp]
 10.1530 -  the amount of `virtual time' the domain is allowed to warp
 10.1531 -  backwards
 10.1532 -\item[warpl]
 10.1533 -  the warp limit is the maximum time a domain can run warped for
 10.1534 -\item[warpu]
 10.1535 -  the unwarp requirement is the minimum time a domain must
 10.1536 -  run unwarped for before it can warp again
 10.1537 -\end{description}
 10.1538 -
 10.1539 -\subsection{Atropos}
 10.1540 -
 10.1541 -{\tt sched=atropos} \\
 10.1542 -
 10.1543 -Atropos is a soft real time scheduler.  It provides guarantees about
 10.1544 -absolute shares of the CPU, with a facility for sharing
 10.1545 -slack CPU time on a best-effort basis. It can provide timeliness
 10.1546 -guarantees for latency-sensitive domains.
 10.1547 -
 10.1548 -Every domain has an associated period and slice.  The domain should
 10.1549 -receive `slice' nanoseconds every `period' nanoseconds.  This allows
 10.1550 -the administrator to configure both the absolute share of the CPU a
 10.1551 -domain receives and the frequency with which it is scheduled. 
 10.1552 -
 10.1553 -%%  When
 10.1554 -%% domains unblock, their period is reduced to the value of the latency
 10.1555 -%% hint (the slice is scaled accordingly so that they still get the same
 10.1556 -%% proportion of the CPU).  For each subsequent period, the slice and
 10.1557 -%% period times are doubled until they reach their original values.
 10.1558 -
 10.1559 -Note: don't overcommit the CPU when using Atropos (i.e. don't reserve
 10.1560 -more CPU than is available --- the utilisation should be kept to
 10.1561 -slightly less than 100\% in order to ensure predictable behaviour).
 10.1562 -
 10.1563 -\subsubsection{Per-domain parameters}
 10.1564 -
 10.1565 -\begin{description}
 10.1566 -\item[period] The regular time interval during which a domain is
 10.1567 -  guaranteed to receive its allocation of CPU time.
 10.1568 -\item[slice]
 10.1569 -  The length of time per period that a domain is guaranteed to run
 10.1570 -  for (in the absence of voluntary yielding of the CPU). 
 10.1571 -\item[latency]
 10.1572 -  The latency hint is used to control how soon after
 10.1573 -  waking up a domain it should be scheduled.
 10.1574 -\item[xtratime] This is a boolean flag that specifies whether a domain
 10.1575 -  should be allowed a share of the system slack time.
 10.1576 -\end{description}
 10.1577 -
 10.1578 -\subsection{Round Robin}
 10.1579 -
 10.1580 -{\tt sched=rrobin} \\
 10.1581 -
 10.1582 -The round robin scheduler is included as a simple demonstration of
 10.1583 -Xen's internal scheduler API.  It is not intended for production use. 
 10.1584 -
 10.1585 -\subsubsection{Global Parameters}
 10.1586 -
 10.1587 -\begin{description}
 10.1588 -\item[rr\_slice]
 10.1589 -  The maximum time each domain runs before the next
 10.1590 -  scheduling decision is made.
 10.1591 -\end{description}
 10.1592 -
 10.1593 -
 10.1594 -
 10.1595 -
 10.1596 -
 10.1597 -
 10.1598 -
 10.1599 -
 10.1600 -
 10.1601 -
 10.1602 -
 10.1603 -
 10.1604 -\chapter{Build, Boot and Debug options} 
 10.1605 -
 10.1606 -This chapter describes the build- and boot-time options 
 10.1607 -which may be used to tailor your Xen system. 
 10.1608 -
 10.1609 -\section{Xen Build Options}
 10.1610 -
 10.1611 -Xen provides a number of build-time options which should be 
 10.1612 -set as environment variables or passed on make's command-line.  
 10.1613 -
 10.1614 -\begin{description} 
 10.1615 -\item[verbose=y] Enable debugging messages when Xen detects an unexpected condition.
 10.1616 -Also enables console output from all domains.
 10.1617 -\item[debug=y] 
 10.1618 -Enable debug assertions.  Implies {\bf verbose=y}.
 10.1619 -(Primarily useful for tracing bugs in Xen).       
 10.1620 -\item[debugger=y] 
 10.1621 -Enable the in-Xen debugger. This can be used to debug 
 10.1622 -Xen, guest OSes, and applications.
 10.1623 -\item[perfc=y] 
 10.1624 -Enable performance counters for significant events
 10.1625 -within Xen. The counts can be reset or displayed
 10.1626 -on Xen's console via console control keys.
 10.1627 -\item[trace=y] 
 10.1628 -Enable per-cpu trace buffers which log a range of
 10.1629 -events within Xen for collection by control
 10.1630 -software. 
 10.1631 -\end{description} 
 10.1632 -
 10.1633 -\section{Xen Boot Options}
 10.1634 -\label{s:xboot}
 10.1635 -
 10.1636 -These options are used to configure Xen's behaviour at runtime.  They
 10.1637 -should be appended to Xen's command line, either manually or by
 10.1638 -editing \path{grub.conf}.
 10.1639 -
 10.1640 -\begin{description}
 10.1641 -\item [noreboot ] 
 10.1642 - Don't reboot the machine automatically on errors.  This is
 10.1643 - useful to catch debug output if you aren't catching console messages
 10.1644 - via the serial line. 
 10.1645 -
 10.1646 -\item [nosmp ] 
 10.1647 - Disable SMP support.
 10.1648 - This option is implied by `ignorebiostables'. 
 10.1649 -
 10.1650 -\item [watchdog ] 
 10.1651 - Enable NMI watchdog which can report certain failures. 
 10.1652 -
 10.1653 -\item [noirqbalance ] 
 10.1654 - Disable software IRQ balancing and affinity. This can be used on
 10.1655 - systems such as Dell 1850/2850 that have workarounds in hardware for
 10.1656 - IRQ-routing issues.
 10.1657 +%% Chapter Domain Configuration moved to domain_configuration.tex
 10.1658 +\include{src/user/domain_configuration}
 10.1659  
 10.1660 -\item [badpage=$<$page number$>$,$<$page number$>$, \ldots ] 
 10.1661 - Specify a list of pages not to be allocated for use 
 10.1662 - because they contain bad bytes. For example, if your
 10.1663 - memory tester says that byte 0x12345678 is bad, you would
 10.1664 - place `badpage=0x12345' on Xen's command line. 
 10.1665 -
 10.1666 -\item [com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$
 10.1667 - com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\ 
 10.1668 - Xen supports up to two 16550-compatible serial ports.
 10.1669 - For example: `com1=9600, 8n1, 0x408, 5' maps COM1 to a
 10.1670 - 9600-baud port, 8 data bits, no parity, 1 stop bit,
 10.1671 - I/O port base 0x408, IRQ 5.
 10.1672 - If some configuration options are standard (e.g., I/O base and IRQ),
 10.1673 - then only a prefix of the full configuration string need be
 10.1674 - specified. If the baud rate is pre-configured (e.g., by the
 10.1675 - bootloader) then you can specify `auto' in place of a numeric baud
 10.1676 - rate. 
 10.1677 -
 10.1678 -\item [console=$<$specifier list$>$ ] 
 10.1679 - Specify the destination for Xen console I/O.
 10.1680 - This is a comma-separated list of, for example:
 10.1681 -\begin{description}
 10.1682 - \item[vga]  use VGA console and allow keyboard input
 10.1683 - \item[com1] use serial port com1
 10.1684 - \item[com2H] use serial port com2. Transmitted chars will
 10.1685 -   have the MSB set. Received chars must have
 10.1686 -   MSB set.
 10.1687 - \item[com2L] use serial port com2. Transmitted chars will
 10.1688 -   have the MSB cleared. Received chars must
 10.1689 -   have MSB cleared.
 10.1690 -\end{description}
 10.1691 - The latter two examples allow a single port to be
 10.1692 - shared by two subsystems (e.g. console and
 10.1693 - debugger). Sharing is controlled by MSB of each
 10.1694 - transmitted/received character.
 10.1695 - [NB. Default for this option is `com1,vga'] 
 10.1696 -
 10.1697 -\item [sync\_console ]
 10.1698 - Force synchronous console output. This is useful if you system fails
 10.1699 - unexpectedly before it has sent all available output to the
 10.1700 - console. In most cases Xen will automatically enter synchronous mode
 10.1701 - when an exceptional event occurs, but this option provides a manual
 10.1702 - fallback.
 10.1703 -
 10.1704 -\item [conswitch=$<$switch-char$><$auto-switch-char$>$ ] 
 10.1705 - Specify how to switch serial-console input between
 10.1706 - Xen and DOM0. The required sequence is CTRL-$<$switch-char$>$
 10.1707 - pressed three times. Specifying the backtick character 
 10.1708 - disables switching.
 10.1709 - The $<$auto-switch-char$>$ specifies whether Xen should
 10.1710 - auto-switch input to DOM0 when it boots --- if it is `x'
 10.1711 - then auto-switching is disabled.  Any other value, or
 10.1712 - omitting the character, enables auto-switching.
 10.1713 - [NB. default switch-char is `a'] 
 10.1714 -
 10.1715 -\item [nmi=xxx ] 
 10.1716 - Specify what to do with an NMI parity or I/O error. \\
 10.1717 - `nmi=fatal':  Xen prints a diagnostic and then hangs. \\
 10.1718 - `nmi=dom0':   Inform DOM0 of the NMI. \\
 10.1719 - `nmi=ignore': Ignore the NMI. 
 10.1720 -
 10.1721 -\item [mem=xxx ]
 10.1722 - Set the physical RAM address limit. Any RAM appearing beyond this
 10.1723 - physical address in the memory map will be ignored. This parameter
 10.1724 - may be specified with a B, K, M or G suffix, representing bytes,
 10.1725 - kilobytes, megabytes and gigabytes respectively. The
 10.1726 - default unit, if no suffix is specified, is kilobytes.
 10.1727 -
 10.1728 -\item [dom0\_mem=xxx ] 
 10.1729 - Set the amount of memory to be allocated to domain0. In Xen 3.x the parameter
 10.1730 - may be specified with a B, K, M or G suffix, representing bytes,
 10.1731 - kilobytes, megabytes and gigabytes respectively; if no suffix is specified, 
 10.1732 - the parameter defaults to kilobytes. In previous versions of Xen, suffixes
 10.1733 - were not supported and the value is always interpreted as kilobytes. 
 10.1734 -
 10.1735 -\item [tbuf\_size=xxx ] 
 10.1736 - Set the size of the per-cpu trace buffers, in pages
 10.1737 - (default 1).  Note that the trace buffers are only
 10.1738 - enabled in debug builds.  Most users can ignore
 10.1739 - this feature completely. 
 10.1740 -
 10.1741 -\item [sched=xxx ] 
 10.1742 - Select the CPU scheduler Xen should use.  The current
 10.1743 - possibilities are `bvt' (default), `atropos' and `rrobin'. 
 10.1744 - For more information see Section~\ref{s:sched}. 
 10.1745 -
 10.1746 -\item [apic\_verbosity=debug,verbose ]
 10.1747 - Print more detailed information about local APIC and IOAPIC configuration.
 10.1748 -
 10.1749 -\item [lapic ]
 10.1750 - Force use of local APIC even when left disabled by uniprocessor BIOS.
 10.1751 -
 10.1752 -\item [nolapic ]
 10.1753 - Ignore local APIC in a uniprocessor system, even if enabled by the BIOS.
 10.1754 -
 10.1755 -\item [apic=bigsmp,default,es7000,summit ]
 10.1756 - Specify NUMA platform. This can usually be probed automatically.
 10.1757 -
 10.1758 -\end{description} 
 10.1759 -
 10.1760 -In addition, the following options may be specified on the Xen command
 10.1761 -line. Since domain 0 shares responsibility for booting the platform,
 10.1762 -Xen will automatically propagate these options to its command
 10.1763 -line. These options are taken from Linux's command-line syntax with
 10.1764 -unchanged semantics.
 10.1765 -
 10.1766 -\begin{description}
 10.1767 -\item [acpi=off,force,strict,ht,noirq,\ldots ] 
 10.1768 - Modify how Xen (and domain 0) parses the BIOS ACPI tables.
 10.1769 -
 10.1770 -\item [acpi\_skip\_timer\_override ]
 10.1771 - Instruct Xen (and domain 0) to ignore timer-interrupt override
 10.1772 - instructions specified by the BIOS ACPI tables.
 10.1773 -
 10.1774 -\item [noapic ]
 10.1775 - Instruct Xen (and domain 0) to ignore any IOAPICs that are present in
 10.1776 - the system, and instead continue to use the legacy PIC.
 10.1777 -
 10.1778 -\end{description} 
 10.1779 -
 10.1780 -\section{XenLinux Boot Options}
 10.1781 -
 10.1782 -In addition to the standard Linux kernel boot options, we support: 
 10.1783 -\begin{description} 
 10.1784 -\item[xencons=xxx ] Specify the device node to which the Xen virtual
 10.1785 -console driver is attached. The following options are supported:
 10.1786 -\begin{center}
 10.1787 -\begin{tabular}{l}
 10.1788 -`xencons=off': disable virtual console \\ 
 10.1789 -`xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\
 10.1790 -`xencons=ttyS': attach console to /dev/ttyS0
 10.1791 -\end{tabular}
 10.1792 -\end{center}
 10.1793 -The default is ttyS for dom0 and tty for all other domains.
 10.1794 -\end{description} 
 10.1795 -
 10.1796 -
 10.1797 -
 10.1798 -\section{Debugging}
 10.1799 -\label{s:keys} 
 10.1800 -
 10.1801 -Xen has a set of debugging features that can be useful to try and
 10.1802 -figure out what's going on. Hit 'h' on the serial line (if you
 10.1803 -specified a baud rate on the Xen command line) or ScrollLock-h on the
 10.1804 -keyboard to get a list of supported commands.
 10.1805 -
 10.1806 -If you have a crash you'll likely get a crash dump containing an EIP
 10.1807 -(PC) which, along with an \path{objdump -d image}, can be useful in
 10.1808 -figuring out what's happened.  Debug a Xenlinux image just as you
 10.1809 -would any other Linux kernel.
 10.1810 -
 10.1811 -%% We supply a handy debug terminal program which you can find in
 10.1812 -%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/}
 10.1813 -%% This should be built and executed on another machine that is connected
 10.1814 -%% via a null modem cable. Documentation is included.
 10.1815 -%% Alternatively, if the Xen machine is connected to a serial-port server
 10.1816 -%% then we supply a dumb TCP terminal client, {\tt xencons}.
 10.1817 -
 10.1818 -
 10.1819 +%% Chapter Build, Boot and Debug Options moved to build.tex
 10.1820 +\include{src/user/build}
 10.1821  
 10.1822  
 10.1823  \chapter{Further Support}
 10.1824 @@ -1875,6 +108,7 @@ directory of the Xen source distribution
 10.1825  %Various HOWTOs are available in \path{docs/HOWTOS} but this content is
 10.1826  %being integrated into this manual.
 10.1827  
 10.1828 +
 10.1829  \section{Online References}
 10.1830  
 10.1831  The official Xen web site is found at:
 10.1832 @@ -1885,6 +119,7 @@ The official Xen web site is found at:
 10.1833  This contains links to the latest versions of all on-line 
 10.1834  documentation (including the lateset version of the FAQ). 
 10.1835  
 10.1836 +
 10.1837  \section{Mailing Lists}
 10.1838  
 10.1839  There are currently four official Xen mailing lists:
 10.1840 @@ -1905,326 +140,18 @@ from the unstable and 2.0 trees - develo
 10.1841  \end{description}
 10.1842  
 10.1843  
 10.1844 +
 10.1845  \appendix
 10.1846  
 10.1847 +%% Chapter Installing Xen / XenLinux on Debian moved to debian.tex
 10.1848 +\include{src/user/debian}
 10.1849 +
 10.1850 +%% Chapter Installing Xen on Red Hat moved to redhat.tex
 10.1851 +\include{src/user/redhat}
 10.1852 +
 10.1853  
 10.1854 -\chapter{Installing Xen / XenLinux on Debian}
 10.1855 -
 10.1856 -The Debian project provides a tool called \path{debootstrap} which
 10.1857 -allows a base Debian system to be installed into a filesystem without
 10.1858 -requiring the host system to have any Debian-specific software (such
 10.1859 -as \path{apt}. 
 10.1860 -
 10.1861 -Here's some info how to install Debian 3.1 (Sarge) for an unprivileged
 10.1862 -Xen domain:
 10.1863 -
 10.1864 -\begin{enumerate}
 10.1865 -\item Set up Xen 2.0 and test that it's working, as described earlier in
 10.1866 -      this manual.
 10.1867 -
 10.1868 -\item Create disk images for root-fs and swap (alternatively, you
 10.1869 -      might create dedicated partitions, LVM logical volumes, etc. if
 10.1870 -      that suits your setup).
 10.1871 -\begin{small}\begin{verbatim}  
 10.1872 -dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes
 10.1873 -dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes
 10.1874 -\end{verbatim}\end{small}
 10.1875 -      If you're going to use this filesystem / disk image only as a
 10.1876 -      `template' for other vm disk images, something like 300 MB should
 10.1877 -      be enough.. (of course it depends what kind of packages you are
 10.1878 -      planning to install to the template)
 10.1879 -
 10.1880 -\item Create the filesystem and initialise the swap image
 10.1881 -\begin{small}\begin{verbatim}
 10.1882 -mkfs.ext3 /path/diskimage
 10.1883 -mkswap /path/swapimage
 10.1884 -\end{verbatim}\end{small}
 10.1885 -
 10.1886 -\item Mount the disk image for installation
 10.1887 -\begin{small}\begin{verbatim}
 10.1888 -mount -o loop /path/diskimage /mnt/disk
 10.1889 -\end{verbatim}\end{small}
 10.1890 -
 10.1891 -\item Install \path{debootstrap}
 10.1892 -
 10.1893 -Make sure you have debootstrap installed on the host.  If you are
 10.1894 -running Debian sarge (3.1 / testing) or unstable you can install it by
 10.1895 -running \path{apt-get install debootstrap}.  Otherwise, it can be
 10.1896 -downloaded from the Debian project website.
 10.1897 -
 10.1898 -\item Install Debian base to the disk image:
 10.1899 -\begin{small}\begin{verbatim}
 10.1900 -debootstrap --arch i386 sarge /mnt/disk  \
 10.1901 -            http://ftp.<countrycode>.debian.org/debian
 10.1902 -\end{verbatim}\end{small}
 10.1903 -
 10.1904 -You can use any other Debian http/ftp mirror you want.
 10.1905 -
 10.1906 -\item When debootstrap completes successfully, modify settings:
 10.1907 -\begin{small}\begin{verbatim}
 10.1908 -chroot /mnt/disk /bin/bash
 10.1909 -\end{verbatim}\end{small}
 10.1910 -
 10.1911 -Edit the following files using vi or nano and make needed changes:
 10.1912 -\begin{small}\begin{verbatim}
 10.1913 -/etc/hostname
 10.1914 -/etc/hosts
 10.1915 -/etc/resolv.conf
 10.1916 -/etc/network/interfaces
 10.1917 -/etc/networks
 10.1918 -\end{verbatim}\end{small}
 10.1919 -
 10.1920 -Set up access to the services, edit:
 10.1921 -\begin{small}\begin{verbatim}
 10.1922 -/etc/hosts.deny
 10.1923 -/etc/hosts.allow
 10.1924 -/etc/inetd.conf
 10.1925 -\end{verbatim}\end{small}
 10.1926 -
 10.1927 -Add Debian mirror to:   
 10.1928 -\begin{small}\begin{verbatim}
 10.1929 -/etc/apt/sources.list
 10.1930 -\end{verbatim}\end{small}
 10.1931 -
 10.1932 -Create fstab like this:
 10.1933 -\begin{small}\begin{verbatim}
 10.1934 -/dev/sda1       /       ext3    errors=remount-ro       0       1
 10.1935 -/dev/sda2       none    swap    sw                      0       0
 10.1936 -proc            /proc   proc    defaults                0       0
 10.1937 -\end{verbatim}\end{small}
 10.1938 -
 10.1939 -Logout
 10.1940 -
 10.1941 -\item      Unmount the disk image
 10.1942 -\begin{small}\begin{verbatim}
 10.1943 -umount /mnt/disk
 10.1944 -\end{verbatim}\end{small}
 10.1945 -
 10.1946 -\item Create Xen 2.0 configuration file for the new domain. You can
 10.1947 -        use the example-configurations coming with Xen as a template.
 10.1948 -
 10.1949 -        Make sure you have the following set up:
 10.1950 -\begin{small}\begin{verbatim}
 10.1951 -disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ]
 10.1952 -root = "/dev/sda1 ro"
 10.1953 -\end{verbatim}\end{small}
 10.1954 -
 10.1955 -\item Start the new domain
 10.1956 -\begin{small}\begin{verbatim}
 10.1957 -xm create -f domain_config_file
 10.1958 -\end{verbatim}\end{small}
 10.1959 -
 10.1960 -Check that the new domain is running:
 10.1961 -\begin{small}\begin{verbatim}
 10.1962 -xm list
 10.1963 -\end{verbatim}\end{small}
 10.1964 -
 10.1965 -\item   Attach to the console of the new domain.
 10.1966 -        You should see something like this when starting the new domain:
 10.1967 -
 10.1968 -\begin{small}\begin{verbatim}
 10.1969 -Started domain testdomain2, console on port 9626
 10.1970 -\end{verbatim}\end{small}
 10.1971 -        
 10.1972 -        There you can see the ID of the console: 26. You can also list
 10.1973 -        the consoles with \path{xm consoles} (ID is the last two
 10.1974 -        digits of the port number.)
 10.1975 -
 10.1976 -        Attach to the console:
 10.1977 -
 10.1978 -\begin{small}\begin{verbatim}
 10.1979 -xm console 26
 10.1980 -\end{verbatim}\end{small}
 10.1981 -
 10.1982 -        or by telnetting to the port 9626 of localhost (the xm console
 10.1983 -        program works better).
 10.1984 -
 10.1985 -\item   Log in and run base-config
 10.1986 -
 10.1987 -        As a default there's no password for the root.
 10.1988 -
 10.1989 -        Check that everything looks OK, and the system started without
 10.1990 -        errors.  Check that the swap is active, and the network settings are
 10.1991 -        correct.
 10.1992 -
 10.1993 -        Run \path{/usr/sbin/base-config} to set up the Debian settings.
 10.1994 -
 10.1995 -        Set up the password for root using passwd.
 10.1996 -
 10.1997 -\item     Done. You can exit the console by pressing \path{Ctrl + ]}
 10.1998 -
 10.1999 -\end{enumerate}
 10.2000 -
 10.2001 -If you need to create new domains, you can just copy the contents of
 10.2002 -the `template'-image to the new disk images, either by mounting the
 10.2003 -template and the new image, and using \path{cp -a} or \path{tar} or by
 10.2004 -simply copying the image file.  Once this is done, modify the
 10.2005 -image-specific settings (hostname, network settings, etc).
 10.2006 -
 10.2007 -\chapter{Installing Xen / XenLinux on Redhat or Fedora Core}
 10.2008 -
 10.2009 -When using Xen / XenLinux on a standard Linux distribution there are
 10.2010 -a couple of things to watch out for:
 10.2011 -
 10.2012 -Note that, because domains>0 don't have any privileged access at all,
 10.2013 -certain commands in the default boot sequence will fail e.g. attempts
 10.2014 -to update the hwclock, change the console font, update the keytable
 10.2015 -map, start apmd (power management), or gpm (mouse cursor).  Either
 10.2016 -ignore the errors (they should be harmless), or remove them from the
 10.2017 -startup scripts.  Deleting the following links are a good start:
 10.2018 -{\path{S24pcmcia}}, {\path{S09isdn}},
 10.2019 -{\path{S17keytable}}, {\path{S26apmd}},
 10.2020 -{\path{S85gpm}}.
 10.2021 -
 10.2022 -If you want to use a single root file system that works cleanly for
 10.2023 -both domain 0 and unprivileged domains, a useful trick is to use
 10.2024 -different 'init' run levels. For example, use
 10.2025 -run level 3 for domain 0, and run level 4 for other domains. This
 10.2026 -enables different startup scripts to be run in depending on the run
 10.2027 -level number passed on the kernel command line.
 10.2028 -
 10.2029 -If using NFS root files systems mounted either from an
 10.2030 -external server or from domain0 there are a couple of other gotchas.
 10.2031 -The default {\path{/etc/sysconfig/iptables}} rules block NFS, so part
 10.2032 -way through the boot sequence things will suddenly go dead.
 10.2033 -
 10.2034 -If you're planning on having a separate NFS {\path{/usr}} partition, the
 10.2035 -RH9 boot scripts don't make life easy - they attempt to mount NFS file
 10.2036 -systems way to late in the boot process. The easiest way I found to do
 10.2037 -this was to have a {\path{/linuxrc}} script run ahead of
 10.2038 -{\path{/sbin/init}} that mounts {\path{/usr}}:
 10.2039 -
 10.2040 -\begin{quote}
 10.2041 -\begin{small}\begin{verbatim}
 10.2042 - #!/bin/bash
 10.2043 - /sbin/ipconfig lo 127.0.0.1
 10.2044 - /sbin/portmap
 10.2045 - /bin/mount /usr
 10.2046 - exec /sbin/init "$@" <>/dev/console 2>&1
 10.2047 -\end{verbatim}\end{small}
 10.2048 -\end{quote}
 10.2049 -
 10.2050 -%$ XXX SMH: font lock fix :-)  
 10.2051 -
 10.2052 -The one slight complication with the above is that
 10.2053 -{\path{/sbin/portmap}} is dynamically linked against
 10.2054 -{\path{/usr/lib/libwrap.so.0}} Since this is in
 10.2055 -{\path{/usr}}, it won't work. This can be solved by copying the
 10.2056 -file (and link) below the /usr mount point, and just let the file be
 10.2057 -'covered' when the mount happens.
 10.2058 -
 10.2059 -In some installations, where a shared read-only {\path{/usr}} is
 10.2060 -being used, it may be desirable to move other large directories over
 10.2061 -into the read-only {\path{/usr}}. For example, you might replace
 10.2062 -{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with
 10.2063 -links into {\path{/usr/root/bin}}, {\path{/usr/root/lib}}
 10.2064 -and {\path{/usr/root/sbin}} respectively. This creates other
 10.2065 -problems for running the {\path{/linuxrc}} script, requiring
 10.2066 -bash, portmap, mount, ifconfig, and a handful of other shared
 10.2067 -libraries to be copied below the mount point --- a simple
 10.2068 -statically-linked C program would solve this problem.
 10.2069 -
 10.2070 -
 10.2071 -
 10.2072 -
 10.2073 -\chapter{Glossary of Terms}
 10.2074 -
 10.2075 -\begin{description}
 10.2076 -\item[Atropos]             One of the CPU schedulers provided by Xen.
 10.2077 -                           Atropos provides domains with absolute shares
 10.2078 -                           of the CPU, with timeliness guarantees and a
 10.2079 -                           mechanism for sharing out `slack time'.
 10.2080 -
 10.2081 -\item[BVT]                 The BVT scheduler is used to give proportional
 10.2082 -                           fair shares of the CPU to domains.
 10.2083 -
 10.2084 -\item[Exokernel]           A minimal piece of privileged code, similar to
 10.2085 -                           a {\bf microkernel} but providing a more
 10.2086 -                           `hardware-like' interface to the tasks it
 10.2087 -                           manages.  This is similar to a paravirtualising
 10.2088 -                           VMM like {\bf Xen} but was designed as a new
 10.2089 -                           operating system structure, rather than
 10.2090 -                           specifically to run multiple conventional OSs.
 10.2091 -
 10.2092 -\item[Domain]              A domain is the execution context that
 10.2093 -                           contains a running {\bf virtual machine}.
 10.2094 -                           The relationship between virtual machines
 10.2095 -                           and domains on Xen is similar to that between
 10.2096 -                           programs and processes in an operating
 10.2097 -                           system: a virtual machine is a persistent
 10.2098 -                           entity that resides on disk (somewhat like
 10.2099 -                           a program).  When it is loaded for execution,
 10.2100 -                           it runs in a domain.  Each domain has a
 10.2101 -                           {\bf domain ID}.
 10.2102 -
 10.2103 -\item[Domain 0]            The first domain to be started on a Xen
 10.2104 -                           machine.  Domain 0 is responsible for managing
 10.2105 -                           the system.
 10.2106 -
 10.2107 -\item[Domain ID]           A unique identifier for a {\bf domain},
 10.2108 -                           analogous to a process ID in an operating
 10.2109 -                           system.
 10.2110 -
 10.2111 -\item[Full virtualisation] An approach to virtualisation which
 10.2112 -                           requires no modifications to the hosted
 10.2113 -                           operating system, providing the illusion of
 10.2114 -                           a complete system of real hardware devices.
 10.2115 -
 10.2116 -\item[Hypervisor]          An alternative term for {\bf VMM}, used
 10.2117 -                           because it means `beyond supervisor',
 10.2118 -                           since it is responsible for managing multiple
 10.2119 -                           `supervisor' kernels.
 10.2120 -
 10.2121 -\item[Live migration]      A technique for moving a running virtual
 10.2122 -                           machine to another physical host, without
 10.2123 -                           stopping it or the services running on it.
 10.2124 -
 10.2125 -\item[Microkernel]         A small base of code running at the highest
 10.2126 -                           hardware privilege level.  A microkernel is
 10.2127 -                           responsible for sharing CPU and memory (and
 10.2128 -                           sometimes other devices) between less
 10.2129 -                           privileged tasks running on the system.
 10.2130 -                           This is similar to a VMM, particularly a
 10.2131 -                           {\bf paravirtualising} VMM but typically
 10.2132 -                           addressing a different problem space and
 10.2133 -                           providing different kind of interface.
 10.2134 -
 10.2135 -\item[NetBSD/Xen]          A port of NetBSD to the Xen architecture.
 10.2136 -
 10.2137 -\item[Paravirtualisation]  An approach to virtualisation which requires
 10.2138 -                           modifications to the operating system in
 10.2139 -                           order to run in a virtual machine.  Xen
 10.2140 -                           uses paravirtualisation but preserves
 10.2141 -                           binary compatibility for user space
 10.2142 -                           applications.
 10.2143 -
 10.2144 -\item[Shadow pagetables]   A technique for hiding the layout of machine
 10.2145 -                           memory from a virtual machine's operating
 10.2146 -                           system.  Used in some {\bf VMMs} to provide
 10.2147 -                           the illusion of contiguous physical memory,
 10.2148 -                           in Xen this is used during
 10.2149 -                           {\bf live migration}.
 10.2150 -
 10.2151 -\item[Virtual Machine]     The environment in which a hosted operating
 10.2152 -                           system runs, providing the abstraction of a
 10.2153 -                           dedicated machine.  A virtual machine may
 10.2154 -                           be identical to the underlying hardware (as
 10.2155 -                           in {\bf full virtualisation}, or it may
 10.2156 -                           differ, as in {\bf paravirtualisation}.
 10.2157 -
 10.2158 -\item[VMM]                 Virtual Machine Monitor - the software that
 10.2159 -                           allows multiple virtual machines to be
 10.2160 -                           multiplexed on a single physical machine.
 10.2161 -
 10.2162 -\item[Xen]                 Xen is a paravirtualising virtual machine
 10.2163 -                           monitor, developed primarily by the
 10.2164 -                           Systems Research Group at the University
 10.2165 -                           of Cambridge Computer Laboratory.
 10.2166 -
 10.2167 -\item[XenLinux]            Official name for the port of the Linux kernel
 10.2168 -                           that runs on Xen.
 10.2169 -
 10.2170 -\end{description}
 10.2171 +%% Chapter Glossary of Terms moved to glossary.tex
 10.2172 +\include{src/user/glossary}
 10.2173  
 10.2174  
 10.2175  \end{document}
    11.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    11.2 +++ b/docs/src/user/build.tex	Tue Sep 20 09:17:33 2005 +0000
    11.3 @@ -0,0 +1,170 @@
    11.4 +\chapter{Build, Boot and Debug Options} 
    11.5 +
    11.6 +This chapter describes the build- and boot-time options which may be
    11.7 +used to tailor your Xen system.
    11.8 +
    11.9 +
   11.10 +\section{Xen Build Options}
   11.11 +
   11.12 +Xen provides a number of build-time options which should be set as
   11.13 +environment variables or passed on make's command-line.
   11.14 +
   11.15 +\begin{description}
   11.16 +\item[verbose=y] Enable debugging messages when Xen detects an
   11.17 +  unexpected condition.  Also enables console output from all domains.
   11.18 +\item[debug=y] Enable debug assertions.  Implies {\bf verbose=y}.
   11.19 +  (Primarily useful for tracing bugs in Xen).
   11.20 +\item[debugger=y] Enable the in-Xen debugger. This can be used to
   11.21 +  debug Xen, guest OSes, and applications.
   11.22 +\item[perfc=y] Enable performance counters for significant events
   11.23 +  within Xen. The counts can be reset or displayed on Xen's console
   11.24 +  via console control keys.
   11.25 +\item[trace=y] Enable per-cpu trace buffers which log a range of
   11.26 +  events within Xen for collection by control software.
   11.27 +\end{description}
   11.28 +
   11.29 +
   11.30 +\section{Xen Boot Options}
   11.31 +\label{s:xboot}
   11.32 +
   11.33 +These options are used to configure Xen's behaviour at runtime.  They
   11.34 +should be appended to Xen's command line, either manually or by
   11.35 +editing \path{grub.conf}.
   11.36 +
   11.37 +\begin{description}
   11.38 +\item [ noreboot ] Don't reboot the machine automatically on errors.
   11.39 +  This is useful to catch debug output if you aren't catching console
   11.40 +  messages via the serial line.
   11.41 +\item [ nosmp ] Disable SMP support.  This option is implied by
   11.42 +  `ignorebiostables'.
   11.43 +\item [ watchdog ] Enable NMI watchdog which can report certain
   11.44 +  failures.
   11.45 +\item [ noirqbalance ] Disable software IRQ balancing and affinity.
   11.46 +  This can be used on systems such as Dell 1850/2850 that have
   11.47 +  workarounds in hardware for IRQ-routing issues.
   11.48 +\item [ badpage=$<$page number$>$,$<$page number$>$, \ldots ] Specify
   11.49 +  a list of pages not to be allocated for use because they contain bad
   11.50 +  bytes. For example, if your memory tester says that byte 0x12345678
   11.51 +  is bad, you would place `badpage=0x12345' on Xen's command line.
   11.52 +\item [ com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$
   11.53 +  com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\
   11.54 +  Xen supports up to two 16550-compatible serial ports.  For example:
   11.55 +  `com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data
   11.56 +  bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5.  If some
   11.57 +  configuration options are standard (e.g., I/O base and IRQ), then
   11.58 +  only a prefix of the full configuration string need be specified. If
   11.59 +  the baud rate is pre-configured (e.g., by the bootloader) then you
   11.60 +  can specify `auto' in place of a numeric baud rate.
   11.61 +\item [ console=$<$specifier list$>$ ] Specify the destination for Xen
   11.62 +  console I/O.  This is a comma-separated list of, for example:
   11.63 +  \begin{description}
   11.64 +  \item[ vga ] Use VGA console and allow keyboard input.
   11.65 +  \item[ com1 ] Use serial port com1.
   11.66 +  \item[ com2H ] Use serial port com2. Transmitted chars will have the
   11.67 +    MSB set. Received chars must have MSB set.
   11.68 +  \item[ com2L] Use serial port com2. Transmitted chars will have the
   11.69 +    MSB cleared. Received chars must have MSB cleared.
   11.70 +  \end{description}
   11.71 +  The latter two examples allow a single port to be shared by two
   11.72 +  subsystems (e.g.\ console and debugger). Sharing is controlled by
   11.73 +  MSB of each transmitted/received character.  [NB. Default for this
   11.74 +  option is `com1,vga']
   11.75 +\item [ sync\_console ] Force synchronous console output. This is
   11.76 +  useful if you system fails unexpectedly before it has sent all
   11.77 +  available output to the console. In most cases Xen will
   11.78 +  automatically enter synchronous mode when an exceptional event
   11.79 +  occurs, but this option provides a manual fallback.
   11.80 +\item [ conswitch=$<$switch-char$><$auto-switch-char$>$ ] Specify how
   11.81 +  to switch serial-console input between Xen and DOM0. The required
   11.82 +  sequence is CTRL-$<$switch-char$>$ pressed three times. Specifying
   11.83 +  the backtick character disables switching.  The
   11.84 +  $<$auto-switch-char$>$ specifies whether Xen should auto-switch
   11.85 +  input to DOM0 when it boots --- if it is `x' then auto-switching is
   11.86 +  disabled.  Any other value, or omitting the character, enables
   11.87 +  auto-switching.  [NB. Default switch-char is `a'.]
   11.88 +\item [ nmi=xxx ]
   11.89 +  Specify what to do with an NMI parity or I/O error. \\
   11.90 +  `nmi=fatal':  Xen prints a diagnostic and then hangs. \\
   11.91 +  `nmi=dom0':   Inform DOM0 of the NMI. \\
   11.92 +  `nmi=ignore': Ignore the NMI.
   11.93 +\item [ mem=xxx ] Set the physical RAM address limit. Any RAM
   11.94 +  appearing beyond this physical address in the memory map will be
   11.95 +  ignored. This parameter may be specified with a B, K, M or G suffix,
   11.96 +  representing bytes, kilobytes, megabytes and gigabytes respectively.
   11.97 +  The default unit, if no suffix is specified, is kilobytes.
   11.98 +\item [ dom0\_mem=xxx ] Set the amount of memory to be allocated to
   11.99 +  domain0. In Xen 3.x the parameter may be specified with a B, K, M or
  11.100 +  G suffix, representing bytes, kilobytes, megabytes and gigabytes
  11.101 +  respectively; if no suffix is specified, the parameter defaults to
  11.102 +  kilobytes. In previous versions of Xen, suffixes were not supported
  11.103 +  and the value is always interpreted as kilobytes.
  11.104 +\item [ tbuf\_size=xxx ] Set the size of the per-cpu trace buffers, in
  11.105 +  pages (default 1).  Note that the trace buffers are only enabled in
  11.106 +  debug builds.  Most users can ignore this feature completely.
  11.107 +\item [ sched=xxx ] Select the CPU scheduler Xen should use.  The
  11.108 +  current possibilities are `bvt' (default), `atropos' and `rrobin'.
  11.109 +  For more information see Section~\ref{s:sched}.
  11.110 +\item [ apic\_verbosity=debug,verbose ] Print more detailed
  11.111 +  information about local APIC and IOAPIC configuration.
  11.112 +\item [ lapic ] Force use of local APIC even when left disabled by
  11.113 +  uniprocessor BIOS.
  11.114 +\item [ nolapic ] Ignore local APIC in a uniprocessor system, even if
  11.115 +  enabled by the BIOS.
  11.116 +\item [ apic=bigsmp,default,es7000,summit ] Specify NUMA platform.
  11.117 +  This can usually be probed automatically.
  11.118 +\end{description}
  11.119 +
  11.120 +In addition, the following options may be specified on the Xen command
  11.121 +line. Since domain 0 shares responsibility for booting the platform,
  11.122 +Xen will automatically propagate these options to its command line.
  11.123 +These options are taken from Linux's command-line syntax with
  11.124 +unchanged semantics.
  11.125 +
  11.126 +\begin{description}
  11.127 +\item [ acpi=off,force,strict,ht,noirq,\ldots ] Modify how Xen (and
  11.128 +  domain 0) parses the BIOS ACPI tables.
  11.129 +\item [ acpi\_skip\_timer\_override ] Instruct Xen (and domain~0) to
  11.130 +  ignore timer-interrupt override instructions specified by the BIOS
  11.131 +  ACPI tables.
  11.132 +\item [ noapic ] Instruct Xen (and domain~0) to ignore any IOAPICs
  11.133 +  that are present in the system, and instead continue to use the
  11.134 +  legacy PIC.
  11.135 +\end{description} 
  11.136 +
  11.137 +
  11.138 +\section{XenLinux Boot Options}
  11.139 +
  11.140 +In addition to the standard Linux kernel boot options, we support:
  11.141 +\begin{description}
  11.142 +\item[ xencons=xxx ] Specify the device node to which the Xen virtual
  11.143 +  console driver is attached. The following options are supported:
  11.144 +  \begin{center}
  11.145 +    \begin{tabular}{l}
  11.146 +      `xencons=off': disable virtual console \\
  11.147 +      `xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\
  11.148 +      `xencons=ttyS': attach console to /dev/ttyS0
  11.149 +    \end{tabular}
  11.150 +\end{center}
  11.151 +The default is ttyS for dom0 and tty for all other domains.
  11.152 +\end{description}
  11.153 +
  11.154 +
  11.155 +\section{Debugging}
  11.156 +\label{s:keys}
  11.157 +
  11.158 +Xen has a set of debugging features that can be useful to try and
  11.159 +figure out what's going on. Hit `h' on the serial line (if you
  11.160 +specified a baud rate on the Xen command line) or ScrollLock-h on the
  11.161 +keyboard to get a list of supported commands.
  11.162 +
  11.163 +If you have a crash you'll likely get a crash dump containing an EIP
  11.164 +(PC) which, along with an \path{objdump -d image}, can be useful in
  11.165 +figuring out what's happened.  Debug a Xenlinux image just as you
  11.166 +would any other Linux kernel.
  11.167 +
  11.168 +%% We supply a handy debug terminal program which you can find in
  11.169 +%% \path{/usr/local/src/xen-2.0.bk/tools/misc/miniterm/} This should
  11.170 +%% be built and executed on another machine that is connected via a
  11.171 +%% null modem cable. Documentation is included.  Alternatively, if the
  11.172 +%% Xen machine is connected to a serial-port server then we supply a
  11.173 +%% dumb TCP terminal client, {\tt xencons}.
    12.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    12.2 +++ b/docs/src/user/control_software.tex	Tue Sep 20 09:17:33 2005 +0000
    12.3 @@ -0,0 +1,115 @@
    12.4 +\chapter{Control Software} 
    12.5 +
    12.6 +The Xen control software includes the \xend\ node control daemon
    12.7 +(which must be running), the xm command line tools, and the prototype
    12.8 +xensv web interface.
    12.9 +
   12.10 +\section{\Xend\ (node control daemon)}
   12.11 +\label{s:xend}
   12.12 +
   12.13 +The Xen Daemon (\Xend) performs system management functions related to
   12.14 +virtual machines.  It forms a central point of control for a machine
   12.15 +and can be controlled using an HTTP-based protocol.  \Xend\ must be
   12.16 +running in order to start and manage virtual machines.
   12.17 +
   12.18 +\Xend\ must be run as root because it needs access to privileged
   12.19 +system management functions.  A small set of commands may be issued on
   12.20 +the \xend\ command line:
   12.21 +
   12.22 +\begin{tabular}{ll}
   12.23 +  \verb!# xend start! & start \xend, if not already running \\
   12.24 +  \verb!# xend stop!  & stop \xend\ if already running       \\
   12.25 +  \verb!# xend restart! & restart \xend\ if running, otherwise start it \\
   12.26 +  % \verb!# xend trace_start! & start \xend, with very detailed debug logging \\
   12.27 +  \verb!# xend status! & indicates \xend\ status by its return code
   12.28 +\end{tabular}
   12.29 +
   12.30 +A SysV init script called {\tt xend} is provided to start \xend\ at
   12.31 +boot time.  {\tt make install} installs this script in
   12.32 +\path{/etc/init.d}.  To enable it, you have to make symbolic links in
   12.33 +the appropriate runlevel directories or use the {\tt chkconfig} tool,
   12.34 +where available.
   12.35 +
   12.36 +Once \xend\ is running, more sophisticated administration can be done
   12.37 +using the xm tool (see Section~\ref{s:xm}) and the experimental Xensv
   12.38 +web interface (see Section~\ref{s:xensv}).
   12.39 +
   12.40 +As \xend\ runs, events will be logged to \path{/var/log/xend.log} and,
   12.41 +if the migration assistant daemon (\path{xfrd}) has been started,
   12.42 +\path{/var/log/xfrd.log}. These may be of use for troubleshooting
   12.43 +problems.
   12.44 +
   12.45 +\section{Xm (command line interface)}
   12.46 +\label{s:xm}
   12.47 +
   12.48 +The xm tool is the primary tool for managing Xen from the console.
   12.49 +The general format of an xm command line is:
   12.50 +
   12.51 +\begin{verbatim}
   12.52 +# xm command [switches] [arguments] [variables]
   12.53 +\end{verbatim}
   12.54 +
   12.55 +The available \emph{switches} and \emph{arguments} are dependent on
   12.56 +the \emph{command} chosen.  The \emph{variables} may be set using
   12.57 +declarations of the form {\tt variable=value} and command line
   12.58 +declarations override any of the values in the configuration file
   12.59 +being used, including the standard variables described above and any
   12.60 +custom variables (for instance, the \path{xmdefconfig} file uses a
   12.61 +{\tt vmid} variable).
   12.62 +
   12.63 +The available commands are as follows:
   12.64 +
   12.65 +\begin{description}
   12.66 +\item[set-mem] Request a domain to adjust its memory footprint.
   12.67 +\item[create] Create a new domain.
   12.68 +\item[destroy] Kill a domain immediately.
   12.69 +\item[list] List running domains.
   12.70 +\item[shutdown] Ask a domain to shutdown.
   12.71 +\item[dmesg] Fetch the Xen (not Linux!) boot output.
   12.72 +\item[consoles] Lists the available consoles.
   12.73 +\item[console] Connect to the console for a domain.
   12.74 +\item[help] Get help on xm commands.
   12.75 +\item[save] Suspend a domain to disk.
   12.76 +\item[restore] Restore a domain from disk.
   12.77 +\item[pause] Pause a domain's execution.
   12.78 +\item[unpause] Un-pause a domain.
   12.79 +\item[pincpu] Pin a domain to a CPU.
   12.80 +\item[bvt] Set BVT scheduler parameters for a domain.
   12.81 +\item[bvt\_ctxallow] Set the BVT context switching allowance for the
   12.82 +  system.
   12.83 +\item[atropos] Set the atropos parameters for a domain.
   12.84 +\item[rrobin] Set the round robin time slice for the system.
   12.85 +\item[info] Get information about the Xen host.
   12.86 +\item[call] Call a \xend\ HTTP API function directly.
   12.87 +\end{description}
   12.88 +
   12.89 +For a detailed overview of switches, arguments and variables to each
   12.90 +command try
   12.91 +\begin{quote}
   12.92 +\begin{verbatim}
   12.93 +# xm help command
   12.94 +\end{verbatim}
   12.95 +\end{quote}
   12.96 +
   12.97 +\section{Xensv (web control interface)}
   12.98 +\label{s:xensv}
   12.99 +
  12.100 +Xensv is the experimental web control interface for managing a Xen
  12.101 +machine.  It can be used to perform some (but not yet all) of the
  12.102 +management tasks that can be done using the xm tool.
  12.103 +
  12.104 +It can be started using:
  12.105 +\begin{quote}
  12.106 +  \verb_# xensv start_
  12.107 +\end{quote}
  12.108 +and stopped using:
  12.109 +\begin{quote}
  12.110 +  \verb_# xensv stop_
  12.111 +\end{quote}
  12.112 +
  12.113 +By default, Xensv will serve out the web interface on port 8080.  This
  12.114 +can be changed by editing
  12.115 +\path{/usr/lib/python2.3/site-packages/xen/sv/params.py}.
  12.116 +
  12.117 +Once Xensv is running, the web interface can be used to create and
  12.118 +manage running domains.
    13.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    13.2 +++ b/docs/src/user/debian.tex	Tue Sep 20 09:17:33 2005 +0000
    13.3 @@ -0,0 +1,154 @@
    13.4 +\chapter{Installing Xen / XenLinux on Debian}
    13.5 +
    13.6 +The Debian project provides a tool called \path{debootstrap} which
    13.7 +allows a base Debian system to be installed into a filesystem without
    13.8 +requiring the host system to have any Debian-specific software (such
    13.9 +as \path{apt}).
   13.10 +
   13.11 +Here's some info how to install Debian 3.1 (Sarge) for an unprivileged
   13.12 +Xen domain:
   13.13 +
   13.14 +\begin{enumerate}
   13.15 +
   13.16 +\item Set up Xen and test that it's working, as described earlier in
   13.17 +  this manual.
   13.18 +
   13.19 +\item Create disk images for rootfs and swap. Alternatively, you might
   13.20 +  create dedicated partitions, LVM logical volumes, etc.\ if that
   13.21 +  suits your setup.
   13.22 +\begin{verbatim}
   13.23 +dd if=/dev/zero of=/path/diskimage bs=1024k count=size_in_mbytes
   13.24 +dd if=/dev/zero of=/path/swapimage bs=1024k count=size_in_mbytes
   13.25 +\end{verbatim}
   13.26 +
   13.27 +  If you're going to use this filesystem / disk image only as a
   13.28 +  `template' for other vm disk images, something like 300 MB should be
   13.29 +  enough. (of course it depends what kind of packages you are planning
   13.30 +  to install to the template)
   13.31 +
   13.32 +\item Create the filesystem and initialise the swap image
   13.33 +\begin{verbatim}
   13.34 +mkfs.ext3 /path/diskimage
   13.35 +mkswap /path/swapimage
   13.36 +\end{verbatim}
   13.37 +
   13.38 +\item Mount the disk image for installation
   13.39 +\begin{verbatim}
   13.40 +mount -o loop /path/diskimage /mnt/disk
   13.41 +\end{verbatim}
   13.42 +
   13.43 +\item Install \path{debootstrap}. Make sure you have debootstrap
   13.44 +  installed on the host.  If you are running Debian Sarge (3.1 /
   13.45 +  testing) or unstable you can install it by running \path{apt-get
   13.46 +    install debootstrap}.  Otherwise, it can be downloaded from the
   13.47 +  Debian project website.
   13.48 +
   13.49 +\item Install Debian base to the disk image:
   13.50 +\begin{verbatim}
   13.51 +debootstrap --arch i386 sarge /mnt/disk  \
   13.52 +            http://ftp.<countrycode>.debian.org/debian
   13.53 +\end{verbatim}
   13.54 +
   13.55 +  You can use any other Debian http/ftp mirror you want.
   13.56 +
   13.57 +\item When debootstrap completes successfully, modify settings:
   13.58 +\begin{verbatim}
   13.59 +chroot /mnt/disk /bin/bash
   13.60 +\end{verbatim}
   13.61 +
   13.62 +Edit the following files using vi or nano and make needed changes:
   13.63 +\begin{verbatim}
   13.64 +/etc/hostname
   13.65 +/etc/hosts
   13.66 +/etc/resolv.conf
   13.67 +/etc/network/interfaces
   13.68 +/etc/networks
   13.69 +\end{verbatim}
   13.70 +
   13.71 +Set up access to the services, edit:
   13.72 +\begin{verbatim}
   13.73 +/etc/hosts.deny
   13.74 +/etc/hosts.allow
   13.75 +/etc/inetd.conf
   13.76 +\end{verbatim}
   13.77 +
   13.78 +Add Debian mirror to:   
   13.79 +\begin{verbatim}
   13.80 +/etc/apt/sources.list
   13.81 +\end{verbatim}
   13.82 +
   13.83 +Create fstab like this:
   13.84 +\begin{verbatim}
   13.85 +/dev/sda1       /       ext3    errors=remount-ro       0       1
   13.86 +/dev/sda2       none    swap    sw                      0       0
   13.87 +proc            /proc   proc    defaults                0       0
   13.88 +\end{verbatim}
   13.89 +
   13.90 +Logout
   13.91 +
   13.92 +\item Unmount the disk image
   13.93 +\begin{verbatim}
   13.94 +umount /mnt/disk
   13.95 +\end{verbatim}
   13.96 +
   13.97 +\item Create Xen 2.0 configuration file for the new domain. You can
   13.98 +  use the example-configurations coming with Xen as a template.
   13.99 +
  13.100 +  Make sure you have the following set up:
  13.101 +\begin{verbatim}
  13.102 +disk = [ 'file:/path/diskimage,sda1,w', 'file:/path/swapimage,sda2,w' ]
  13.103 +root = "/dev/sda1 ro"
  13.104 +\end{verbatim}
  13.105 +
  13.106 +\item Start the new domain
  13.107 +\begin{verbatim}
  13.108 +xm create -f domain_config_file
  13.109 +\end{verbatim}
  13.110 +
  13.111 +Check that the new domain is running:
  13.112 +\begin{verbatim}
  13.113 +xm list
  13.114 +\end{verbatim}
  13.115 +
  13.116 +\item Attach to the console of the new domain.  You should see
  13.117 +  something like this when starting the new domain:
  13.118 +
  13.119 +\begin{verbatim}
  13.120 +Started domain testdomain2, console on port 9626
  13.121 +\end{verbatim}
  13.122 +        
  13.123 +  There you can see the ID of the console: 26. You can also list the
  13.124 +  consoles with \path{xm consoles} (ID is the last two digits of the
  13.125 +  port number.)
  13.126 +
  13.127 +  Attach to the console:
  13.128 +
  13.129 +\begin{verbatim}
  13.130 +xm console 26
  13.131 +\end{verbatim}
  13.132 +
  13.133 +  or by telnetting to the port 9626 of localhost (the xm console
  13.134 +  program works better).
  13.135 +
  13.136 +\item Log in and run base-config
  13.137 +
  13.138 +  As a default there's no password for the root.
  13.139 +
  13.140 +  Check that everything looks OK, and the system started without
  13.141 +  errors.  Check that the swap is active, and the network settings are
  13.142 +  correct.
  13.143 +
  13.144 +  Run \path{/usr/sbin/base-config} to set up the Debian settings.
  13.145 +
  13.146 +  Set up the password for root using passwd.
  13.147 +
  13.148 +\item Done. You can exit the console by pressing {\path{Ctrl + ]}}
  13.149 +
  13.150 +\end{enumerate}
  13.151 +
  13.152 +
  13.153 +If you need to create new domains, you can just copy the contents of
  13.154 +the `template'-image to the new disk images, either by mounting the
  13.155 +template and the new image, and using \path{cp -a} or \path{tar} or by
  13.156 +simply copying the image file.  Once this is done, modify the
  13.157 +image-specific settings (hostname, network settings, etc).
    14.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    14.2 +++ b/docs/src/user/domain_configuration.tex	Tue Sep 20 09:17:33 2005 +0000
    14.3 @@ -0,0 +1,281 @@
    14.4 +\chapter{Domain Configuration}
    14.5 +\label{cha:config}
    14.6 +
    14.7 +The following contains the syntax of the domain configuration files
    14.8 +and description of how to further specify networking, driver domain
    14.9 +and general scheduling behavior.
   14.10 +
   14.11 +
   14.12 +\section{Configuration Files}
   14.13 +\label{s:cfiles}
   14.14 +
   14.15 +Xen configuration files contain the following standard variables.
   14.16 +Unless otherwise stated, configuration items should be enclosed in
   14.17 +quotes: see \path{/etc/xen/xmexample1} and \path{/etc/xen/xmexample2}
   14.18 +for concrete examples of the syntax.
   14.19 +
   14.20 +\begin{description}
   14.21 +\item[kernel] Path to the kernel image.
   14.22 +\item[ramdisk] Path to a ramdisk image (optional).
   14.23 +  % \item[builder] The name of the domain build function (e.g.
   14.24 +  %   {\tt'linux'} or {\tt'netbsd'}.
   14.25 +\item[memory] Memory size in megabytes.
   14.26 +\item[cpu] CPU to run this domain on, or {\tt -1} for auto-allocation.
   14.27 +\item[console] Port to export the domain console on (default 9600 +
   14.28 +  domain ID).
   14.29 +\item[nics] Number of virtual network interfaces.
   14.30 +\item[vif] List of MAC addresses (random addresses are assigned if not
   14.31 +  given) and bridges to use for the domain's network interfaces, e.g.\ 
   14.32 +\begin{verbatim}
   14.33 +vif = [ 'mac=aa:00:00:00:00:11, bridge=xen-br0',
   14.34 +        'bridge=xen-br1' ]
   14.35 +\end{verbatim}
   14.36 +  to assign a MAC address and bridge to the first interface and assign
   14.37 +  a different bridge to the second interface, leaving \xend\ to choose
   14.38 +  the MAC address.
   14.39 +\item[disk] List of block devices to export to the domain, e.g.\ \\
   14.40 +  \verb_disk = [ 'phy:hda1,sda1,r' ]_ \\
   14.41 +  exports physical device \path{/dev/hda1} to the domain as
   14.42 +  \path{/dev/sda1} with read-only access. Exporting a disk read-write
   14.43 +  which is currently mounted is dangerous -- if you are \emph{certain}
   14.44 +  you wish to do this, you can specify \path{w!} as the mode.
   14.45 +\item[dhcp] Set to {\tt `dhcp'} if you want to use DHCP to configure
   14.46 +  networking.
   14.47 +\item[netmask] Manually configured IP netmask.
   14.48 +\item[gateway] Manually configured IP gateway.
   14.49 +\item[hostname] Set the hostname for the virtual machine.
   14.50 +\item[root] Specify the root device parameter on the kernel command
   14.51 +  line.
   14.52 +\item[nfs\_server] IP address for the NFS server (if any).
   14.53 +\item[nfs\_root] Path of the root filesystem on the NFS server (if
   14.54 +  any).
   14.55 +\item[extra] Extra string to append to the kernel command line (if
   14.56 +  any)
   14.57 +\item[restart] Three possible options:
   14.58 +  \begin{description}
   14.59 +  \item[always] Always restart the domain, no matter what its exit
   14.60 +    code is.
   14.61 +  \item[never] Never restart the domain.
   14.62 +  \item[onreboot] Restart the domain iff it requests reboot.
   14.63 +  \end{description}
   14.64 +\end{description}
   14.65 +
   14.66 +For additional flexibility, it is also possible to include Python
   14.67 +scripting commands in configuration files.  An example of this is the
   14.68 +\path{xmexample2} file, which uses Python code to handle the
   14.69 +\path{vmid} variable.
   14.70 +
   14.71 +
   14.72 +%\part{Advanced Topics}
   14.73 +
   14.74 +
   14.75 +\section{Network Configuration}
   14.76 +
   14.77 +For many users, the default installation should work ``out of the
   14.78 +box''.  More complicated network setups, for instance with multiple
   14.79 +Ethernet interfaces and/or existing bridging setups will require some
   14.80 +special configuration.
   14.81 +
   14.82 +The purpose of this section is to describe the mechanisms provided by
   14.83 +\xend\ to allow a flexible configuration for Xen's virtual networking.
   14.84 +
   14.85 +\subsection{Xen virtual network topology}
   14.86 +
   14.87 +Each domain network interface is connected to a virtual network
   14.88 +interface in dom0 by a point to point link (effectively a ``virtual
   14.89 +crossover cable'').  These devices are named {\tt
   14.90 +  vif$<$domid$>$.$<$vifid$>$} (e.g.\ {\tt vif1.0} for the first
   14.91 +interface in domain~1, {\tt vif3.1} for the second interface in
   14.92 +domain~3).
   14.93 +
   14.94 +Traffic on these virtual interfaces is handled in domain~0 using
   14.95 +standard Linux mechanisms for bridging, routing, rate limiting, etc.
   14.96 +Xend calls on two shell scripts to perform initial configuration of
   14.97 +the network and configuration of new virtual interfaces.  By default,
   14.98 +these scripts configure a single bridge for all the virtual
   14.99 +interfaces.  Arbitrary routing / bridging configurations can be
  14.100 +configured by customizing the scripts, as described in the following
  14.101 +section.
  14.102 +
  14.103 +\subsection{Xen networking scripts}
  14.104 +
  14.105 +Xen's virtual networking is configured by two shell scripts (by
  14.106 +default \path{network} and \path{vif-bridge}).  These are called
  14.107 +automatically by \xend\ when certain events occur, with arguments to
  14.108 +the scripts providing further contextual information.  These scripts
  14.109 +are found by default in \path{/etc/xen/scripts}.  The names and
  14.110 +locations of the scripts can be configured in
  14.111 +\path{/etc/xen/xend-config.sxp}.
  14.112 +
  14.113 +\begin{description}
  14.114 +\item[network:] This script is called whenever \xend\ is started or
  14.115 +  stopped to respectively initialize or tear down the Xen virtual
  14.116 +  network. In the default configuration initialization creates the
  14.117 +  bridge `xen-br0' and moves eth0 onto that bridge, modifying the
  14.118 +  routing accordingly. When \xend\ exits, it deletes the Xen bridge
  14.119 +  and removes eth0, restoring the normal IP and routing configuration.
  14.120 +
  14.121 +  %% In configurations where the bridge already exists, this script
  14.122 +  %% could be replaced with a link to \path{/bin/true} (for instance).
  14.123 +
  14.124 +\item[vif-bridge:] This script is called for every domain virtual
  14.125 +  interface and can configure firewalling rules and add the vif to the
  14.126 +  appropriate bridge. By default, this adds and removes VIFs on the
  14.127 +  default Xen bridge.
  14.128 +\end{description}
  14.129 +
  14.130 +For more complex network setups (e.g.\ where routing is required or
  14.131 +integrate with existing bridges) these scripts may be replaced with
  14.132 +customized variants for your site's preferred configuration.
  14.133 +
  14.134 +%% There are two possible types of privileges: IO privileges and
  14.135 +%% administration privileges.
  14.136 +
  14.137 +
  14.138 +\section{Driver Domain Configuration}
  14.139 +
  14.140 +I/O privileges can be assigned to allow a domain to directly access
  14.141 +PCI devices itself.  This is used to support driver domains.
  14.142 +
  14.143 +Setting back-end privileges is currently only supported in SXP format
  14.144 +config files.  To allow a domain to function as a back-end for others,
  14.145 +somewhere within the {\tt vm} element of its configuration file must
  14.146 +be a {\tt back-end} element of the form {\tt (back-end ({\em type}))}
  14.147 +where {\tt \em type} may be either {\tt netif} or {\tt blkif},
  14.148 +according to the type of virtual device this domain will service.
  14.149 +%% After this domain has been built, \xend will connect all new and
  14.150 +%% existing {\em virtual} devices (of the appropriate type) to that
  14.151 +%% back-end.
  14.152 +
  14.153 +Note that a block back-end cannot currently import virtual block
  14.154 +devices from other domains, and a network back-end cannot import
  14.155 +virtual network devices from other domains.  Thus (particularly in the
  14.156 +case of block back-ends, which cannot import a virtual block device as
  14.157 +their root filesystem), you may need to boot a back-end domain from a
  14.158 +ramdisk or a network device.
  14.159 +
  14.160 +Access to PCI devices may be configured on a per-device basis.  Xen
  14.161 +will assign the minimal set of hardware privileges to a domain that
  14.162 +are required to control its devices.  This can be configured in either
  14.163 +format of configuration file:
  14.164 +
  14.165 +\begin{itemize}
  14.166 +\item SXP Format: Include device elements of the form: \\
  14.167 +  \centerline{  {\tt (device (pci (bus {\em x}) (dev {\em y}) (func {\em z})))}} \\
  14.168 +  inside the top-level {\tt vm} element.  Each one specifies the
  14.169 +  address of a device this domain is allowed to access --- the numbers
  14.170 +  \emph{x},\emph{y} and \emph{z} may be in either decimal or
  14.171 +  hexadecimal format.
  14.172 +\item Flat Format: Include a list of PCI device addresses of the
  14.173 +  format: \\
  14.174 +  \centerline{{\tt pci = ['x,y,z', \ldots]}} \\
  14.175 +  where each element in the list is a string specifying the components
  14.176 +  of the PCI device address, separated by commas.  The components
  14.177 +  ({\tt \em x}, {\tt \em y} and {\tt \em z}) of the list may be
  14.178 +  formatted as either decimal or hexadecimal.
  14.179 +\end{itemize}
  14.180 +
  14.181 +%% \section{Administration Domains}
  14.182 +
  14.183 +%% Administration privileges allow a domain to use the `dom0
  14.184 +%% operations' (so called because they are usually available only to
  14.185 +%% domain 0).  A privileged domain can build other domains, set
  14.186 +%% scheduling parameters, etc.
  14.187 +
  14.188 +% Support for other administrative domains is not yet available...
  14.189 +% perhaps we should plumb it in some time
  14.190 +
  14.191 +
  14.192 +\section{Scheduler Configuration}
  14.193 +\label{s:sched}
  14.194 +
  14.195 +Xen offers a boot time choice between multiple schedulers.  To select
  14.196 +a scheduler, pass the boot parameter \emph{sched=sched\_name} to Xen,
  14.197 +substituting the appropriate scheduler name.  Details of the
  14.198 +schedulers and their parameters are included below; future versions of
  14.199 +the tools will provide a higher-level interface to these tools.
  14.200 +
  14.201 +It is expected that system administrators configure their system to
  14.202 +use the scheduler most appropriate to their needs.  Currently, the BVT
  14.203 +scheduler is the recommended choice.
  14.204 +
  14.205 +\subsection{Borrowed Virtual Time}
  14.206 +
  14.207 +{\tt sched=bvt} (the default) \\
  14.208 +
  14.209 +BVT provides proportional fair shares of the CPU time.  It has been
  14.210 +observed to penalize domains that block frequently (e.g.\ I/O
  14.211 +intensive domains), but this can be compensated for by using warping.
  14.212 +
  14.213 +\subsubsection{Global Parameters}
  14.214 +
  14.215 +\begin{description}
  14.216 +\item[ctx\_allow] The context switch allowance is similar to the
  14.217 +  ``quantum'' in traditional schedulers.  It is the minimum time that
  14.218 +  a scheduled domain will be allowed to run before being preempted.
  14.219 +\end{description}
  14.220 +
  14.221 +\subsubsection{Per-domain parameters}
  14.222 +
  14.223 +\begin{description}
  14.224 +\item[mcuadv] The MCU (Minimum Charging Unit) advance determines the
  14.225 +  proportional share of the CPU that a domain receives.  It is set
  14.226 +  inversely proportionally to a domain's sharing weight.
  14.227 +\item[warp] The amount of ``virtual time'' the domain is allowed to
  14.228 +  warp backwards.
  14.229 +\item[warpl] The warp limit is the maximum time a domain can run
  14.230 +  warped for.
  14.231 +\item[warpu] The unwarp requirement is the minimum time a domain must
  14.232 +  run unwarped for before it can warp again.
  14.233 +\end{description}
  14.234 +
  14.235 +\subsection{Atropos}
  14.236 +
  14.237 +{\tt sched=atropos} \\
  14.238 +
  14.239 +Atropos is a soft real time scheduler.  It provides guarantees about
  14.240 +absolute shares of the CPU, with a facility for sharing slack CPU time
  14.241 +on a best-effort basis. It can provide timeliness guarantees for
  14.242 +latency-sensitive domains.
  14.243 +
  14.244 +Every domain has an associated period and slice.  The domain should
  14.245 +receive `slice' nanoseconds every `period' nanoseconds.  This allows
  14.246 +the administrator to configure both the absolute share of the CPU a
  14.247 +domain receives and the frequency with which it is scheduled.
  14.248 +
  14.249 +%% When domains unblock, their period is reduced to the value of the
  14.250 +%% latency hint (the slice is scaled accordingly so that they still
  14.251 +%% get the same proportion of the CPU).  For each subsequent period,
  14.252 +%% the slice and period times are doubled until they reach their
  14.253 +%% original values.
  14.254 +
  14.255 +Note: don't over-commit the CPU when using Atropos (i.e.\ don't reserve
  14.256 +more CPU than is available --- the utilization should be kept to
  14.257 +slightly less than 100\% in order to ensure predictable behavior).
  14.258 +
  14.259 +\subsubsection{Per-domain parameters}
  14.260 +
  14.261 +\begin{description}
  14.262 +\item[period] The regular time interval during which a domain is
  14.263 +  guaranteed to receive its allocation of CPU time.
  14.264 +\item[slice] The length of time per period that a domain is guaranteed
  14.265 +  to run for (in the absence of voluntary yielding of the CPU).
  14.266 +\item[latency] The latency hint is used to control how soon after
  14.267 +  waking up a domain it should be scheduled.
  14.268 +\item[xtratime] This is a boolean flag that specifies whether a domain
  14.269 +  should be allowed a share of the system slack time.
  14.270 +\end{description}
  14.271 +
  14.272 +\subsection{Round Robin}
  14.273 +
  14.274 +{\tt sched=rrobin} \\
  14.275 +
  14.276 +The round robin scheduler is included as a simple demonstration of
  14.277 +Xen's internal scheduler API.  It is not intended for production use.
  14.278 +
  14.279 +\subsubsection{Global Parameters}
  14.280 +
  14.281 +\begin{description}
  14.282 +\item[rr\_slice] The maximum time each domain runs before the next
  14.283 +  scheduling decision is made.
  14.284 +\end{description}
    15.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    15.2 +++ b/docs/src/user/domain_filesystem.tex	Tue Sep 20 09:17:33 2005 +0000
    15.3 @@ -0,0 +1,243 @@
    15.4 +\chapter{Domain Filesystem Storage}
    15.5 +
    15.6 +It is possible to directly export any Linux block device in dom0 to
    15.7 +another domain, or to export filesystems / devices to virtual machines
    15.8 +using standard network protocols (e.g.\ NBD, iSCSI, NFS, etc.).  This
    15.9 +chapter covers some of the possibilities.
   15.10 +
   15.11 +
   15.12 +\section{Exporting Physical Devices as VBDs}
   15.13 +\label{s:exporting-physical-devices-as-vbds}
   15.14 +
   15.15 +One of the simplest configurations is to directly export individual
   15.16 +partitions from domain~0 to other domains. To achieve this use the
   15.17 +\path{phy:} specifier in your domain configuration file. For example a
   15.18 +line like
   15.19 +\begin{quote}
   15.20 +  \verb_disk = ['phy:hda3,sda1,w']_
   15.21 +\end{quote}
   15.22 +specifies that the partition \path{/dev/hda3} in domain~0 should be
   15.23 +exported read-write to the new domain as \path{/dev/sda1}; one could
   15.24 +equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should
   15.25 +one wish.
   15.26 +
   15.27 +In addition to local disks and partitions, it is possible to export
   15.28 +any device that Linux considers to be ``a disk'' in the same manner.
   15.29 +For example, if you have iSCSI disks or GNBD volumes imported into
   15.30 +domain~0 you can export these to other domains using the \path{phy:}
   15.31 +disk syntax. E.g.:
   15.32 +\begin{quote}
   15.33 +  \verb_disk = ['phy:vg/lvm1,sda2,w']_
   15.34 +\end{quote}
   15.35 +
   15.36 +\begin{center}
   15.37 +  \framebox{\bf Warning: Block device sharing}
   15.38 +\end{center}
   15.39 +\begin{quote}
   15.40 +  Block devices should typically only be shared between domains in a
   15.41 +  read-only fashion otherwise the Linux kernel's file systems will get
   15.42 +  very confused as the file system structure may change underneath
   15.43 +  them (having the same ext3 partition mounted \path{rw} twice is a
   15.44 +  sure fire way to cause irreparable damage)!  \Xend\ will attempt to
   15.45 +  prevent you from doing this by checking that the device is not
   15.46 +  mounted read-write in domain~0, and hasn't already been exported
   15.47 +  read-write to another domain.  If you want read-write sharing,
   15.48 +  export the directory to other domains via NFS from domain~0 (or use
   15.49 +  a cluster file system such as GFS or ocfs2).
   15.50 +\end{quote}
   15.51 +
   15.52 +
   15.53 +\section{Using File-backed VBDs}
   15.54 +
   15.55 +It is also possible to use a file in Domain~0 as the primary storage
   15.56 +for a virtual machine.  As well as being convenient, this also has the
   15.57 +advantage that the virtual block device will be \emph{sparse} ---
   15.58 +space will only really be allocated as parts of the file are used.  So
   15.59 +if a virtual machine uses only half of its disk space then the file
   15.60 +really takes up half of the size allocated.
   15.61 +
   15.62 +For example, to create a 2GB sparse file-backed virtual block device
   15.63 +(actually only consumes 1KB of disk):
   15.64 +\begin{quote}
   15.65 +  \verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=1_
   15.66 +\end{quote}
   15.67 +
   15.68 +Make a file system in the disk file:
   15.69 +\begin{quote}
   15.70 +  \verb_# mkfs -t ext3 vm1disk_
   15.71 +\end{quote}
   15.72 +
   15.73 +(when the tool asks for confirmation, answer `y')
   15.74 +
   15.75 +Populate the file system e.g.\ by copying from the current root:
   15.76 +\begin{quote}
   15.77 +\begin{verbatim}
   15.78 +# mount -o loop vm1disk /mnt
   15.79 +# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt
   15.80 +# mkdir /mnt/{proc,sys,home,tmp}
   15.81 +\end{verbatim}
   15.82 +\end{quote}
   15.83 +
   15.84 +Tailor the file system by editing \path{/etc/fstab},
   15.85 +\path{/etc/hostname}, etc.\ Don't forget to edit the files in the
   15.86 +mounted file system, instead of your domain~0 filesystem, e.g.\ you
   15.87 +would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}.  For
   15.88 +this example put \path{/dev/sda1} to root in fstab.
   15.89 +
   15.90 +Now unmount (this is important!):
   15.91 +\begin{quote}
   15.92 +  \verb_# umount /mnt_
   15.93 +\end{quote}
   15.94 +
   15.95 +In the configuration file set:
   15.96 +\begin{quote}
   15.97 +  \verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_
   15.98 +\end{quote}
   15.99 +
  15.100 +As the virtual machine writes to its `disk', the sparse file will be
  15.101 +filled in and consume more space up to the original 2GB.
  15.102 +
  15.103 +{\bf Note that file-backed VBDs may not be appropriate for backing
  15.104 +  I/O-intensive domains.}  File-backed VBDs are known to experience
  15.105 +substantial slowdowns under heavy I/O workloads, due to the I/O
  15.106 +handling by the loopback block device used to support file-backed VBDs
  15.107 +in dom0.  Better I/O performance can be achieved by using either
  15.108 +LVM-backed VBDs (Section~\ref{s:using-lvm-backed-vbds}) or physical
  15.109 +devices as VBDs (Section~\ref{s:exporting-physical-devices-as-vbds}).
  15.110 +
  15.111 +Linux supports a maximum of eight file-backed VBDs across all domains
  15.112 +by default.  This limit can be statically increased by using the
  15.113 +\emph{max\_loop} module parameter if CONFIG\_BLK\_DEV\_LOOP is
  15.114 +compiled as a module in the dom0 kernel, or by using the
  15.115 +\emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP is compiled
  15.116 +directly into the dom0 kernel.
  15.117 +
  15.118 +
  15.119 +\section{Using LVM-backed VBDs}
  15.120 +\label{s:using-lvm-backed-vbds}
  15.121 +
  15.122 +A particularly appealing solution is to use LVM volumes as backing for
  15.123 +domain file-systems since this allows dynamic growing/shrinking of
  15.124 +volumes as well as snapshot and other features.
  15.125 +
  15.126 +To initialize a partition to support LVM volumes:
  15.127 +\begin{quote}
  15.128 +\begin{verbatim}
  15.129 +# pvcreate /dev/sda10           
  15.130 +\end{verbatim} 
  15.131 +\end{quote}
  15.132 +
  15.133 +Create a volume group named `vg' on the physical partition:
  15.134 +\begin{quote}
  15.135 +\begin{verbatim}
  15.136 +# vgcreate vg /dev/sda10
  15.137 +\end{verbatim} 
  15.138 +\end{quote}
  15.139 +
  15.140 +Create a logical volume of size 4GB named `myvmdisk1':
  15.141 +\begin{quote}
  15.142 +\begin{verbatim}
  15.143 +# lvcreate -L4096M -n myvmdisk1 vg
  15.144 +\end{verbatim}
  15.145 +\end{quote}
  15.146 +
  15.147 +You should now see that you have a \path{/dev/vg/myvmdisk1} Make a
  15.148 +filesystem, mount it and populate it, e.g.:
  15.149 +\begin{quote}
  15.150 +\begin{verbatim}
  15.151 +# mkfs -t ext3 /dev/vg/myvmdisk1
  15.152 +# mount /dev/vg/myvmdisk1 /mnt
  15.153 +# cp -ax / /mnt
  15.154 +# umount /mnt
  15.155 +\end{verbatim}
  15.156 +\end{quote}
  15.157 +
  15.158 +Now configure your VM with the following disk configuration:
  15.159 +\begin{quote}
  15.160 +\begin{verbatim}
  15.161 + disk = [ 'phy:vg/myvmdisk1,sda1,w' ]
  15.162 +\end{verbatim}
  15.163 +\end{quote}
  15.164 +
  15.165 +LVM enables you to grow the size of logical volumes, but you'll need
  15.166 +to resize the corresponding file system to make use of the new space.
  15.167 +Some file systems (e.g.\ ext3) now support online resize.  See the LVM
  15.168 +manuals for more details.
  15.169 +
  15.170 +You can also use LVM for creating copy-on-write (CoW) clones of LVM
  15.171 +volumes (known as writable persistent snapshots in LVM terminology).
  15.172 +This facility is new in Linux 2.6.8, so isn't as stable as one might
  15.173 +hope.  In particular, using lots of CoW LVM disks consumes a lot of
  15.174 +dom0 memory, and error conditions such as running out of disk space
  15.175 +are not handled well. Hopefully this will improve in future.
  15.176 +
  15.177 +To create two copy-on-write clone of the above file system you would
  15.178 +use the following commands:
  15.179 +
  15.180 +\begin{quote}
  15.181 +\begin{verbatim}
  15.182 +# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1
  15.183 +# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1
  15.184 +\end{verbatim}
  15.185 +\end{quote}
  15.186 +
  15.187 +Each of these can grow to have 1GB of differences from the master
  15.188 +volume. You can grow the amount of space for storing the differences
  15.189 +using the lvextend command, e.g.:
  15.190 +\begin{quote}
  15.191 +\begin{verbatim}
  15.192 +# lvextend +100M /dev/vg/myclonedisk1
  15.193 +\end{verbatim}
  15.194 +\end{quote}
  15.195 +
  15.196 +Don't let the `differences volume' ever fill up otherwise LVM gets
  15.197 +rather confused. It may be possible to automate the growing process by
  15.198 +using \path{dmsetup wait} to spot the volume getting full and then
  15.199 +issue an \path{lvextend}.
  15.200 +
  15.201 +In principle, it is possible to continue writing to the volume that
  15.202 +has been cloned (the changes will not be visible to the clones), but
  15.203 +we wouldn't recommend this: have the cloned volume as a `pristine'
  15.204 +file system install that isn't mounted directly by any of the virtual
  15.205 +machines.
  15.206 +
  15.207 +
  15.208 +\section{Using NFS Root}
  15.209 +
  15.210 +First, populate a root filesystem in a directory on the server
  15.211 +machine. This can be on a distinct physical machine, or simply run
  15.212 +within a virtual machine on the same node.
  15.213 +
  15.214 +Now configure the NFS server to export this filesystem over the
  15.215 +network by adding a line to \path{/etc/exports}, for instance:
  15.216 +
  15.217 +\begin{quote}
  15.218 +  \begin{small}
  15.219 +\begin{verbatim}
  15.220 +/export/vm1root      1.2.3.4/24 (rw,sync,no_root_squash)
  15.221 +\end{verbatim}
  15.222 +  \end{small}
  15.223 +\end{quote}
  15.224 +
  15.225 +Finally, configure the domain to use NFS root.  In addition to the
  15.226 +normal variables, you should make sure to set the following values in
  15.227 +the domain's configuration file:
  15.228 +
  15.229 +\begin{quote}
  15.230 +  \begin{small}
  15.231 +\begin{verbatim}
  15.232 +root       = '/dev/nfs'
  15.233 +nfs_server = '2.3.4.5'       # substitute IP address of server
  15.234 +nfs_root   = '/path/to/root' # path to root FS on the server
  15.235 +\end{verbatim}
  15.236 +  \end{small}
  15.237 +\end{quote}
  15.238 +
  15.239 +The domain will need network access at boot time, so either statically
  15.240 +configure an IP address using the config variables \path{ip},
  15.241 +\path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP
  15.242 +(\path{dhcp='dhcp'}).
  15.243 +
  15.244 +Note that the Linux NFS root implementation is known to have stability
  15.245 +problems under high load (this is not a Xen-specific problem), so this
  15.246 +configuration may not be appropriate for critical servers.
    16.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    16.2 +++ b/docs/src/user/domain_mgmt.tex	Tue Sep 20 09:17:33 2005 +0000
    16.3 @@ -0,0 +1,203 @@
    16.4 +\chapter{Domain Management Tools}
    16.5 +
    16.6 +The previous chapter described a simple example of how to configure
    16.7 +and start a domain.  This chapter summarises the tools available to
    16.8 +manage running domains.
    16.9 +
   16.10 +
   16.11 +\section{Command-line Management}
   16.12 +
   16.13 +Command line management tasks are also performed using the \path{xm}
   16.14 +tool.  For online help for the commands available, type:
   16.15 +\begin{quote}
   16.16 +  \verb_# xm help_
   16.17 +\end{quote}
   16.18 +
   16.19 +You can also type \path{xm help $<$command$>$} for more information on
   16.20 +a given command.
   16.21 +
   16.22 +\subsection{Basic Management Commands}
   16.23 +
   16.24 +The most important \path{xm} commands are:
   16.25 +\begin{quote}
   16.26 +  \verb_# xm list_: Lists all domains running.\\
   16.27 +  \verb_# xm consoles_: Gives information about the domain consoles.\\
   16.28 +  \verb_# xm console_: Opens a console to a domain (e.g.\
   16.29 +  \verb_# xm console myVM_)
   16.30 +\end{quote}
   16.31 +
   16.32 +\subsection{\tt xm list}
   16.33 +
   16.34 +The output of \path{xm list} is in rows of the following format:
   16.35 +\begin{center} {\tt name domid memory cpu state cputime console}
   16.36 +\end{center}
   16.37 +
   16.38 +\begin{quote}
   16.39 +  \begin{description}
   16.40 +  \item[name] The descriptive name of the virtual machine.
   16.41 +  \item[domid] The number of the domain ID this virtual machine is
   16.42 +    running in.
   16.43 +  \item[memory] Memory size in megabytes.
   16.44 +  \item[cpu] The CPU this domain is running on.
   16.45 +  \item[state] Domain state consists of 5 fields:
   16.46 +    \begin{description}
   16.47 +    \item[r] running
   16.48 +    \item[b] blocked
   16.49 +    \item[p] paused
   16.50 +    \item[s] shutdown
   16.51 +    \item[c] crashed
   16.52 +    \end{description}
   16.53 +  \item[cputime] How much CPU time (in seconds) the domain has used so
   16.54 +    far.
   16.55 +  \item[console] TCP port accepting connections to the domain's
   16.56 +    console.
   16.57 +  \end{description}
   16.58 +\end{quote}
   16.59 +
   16.60 +The \path{xm list} command also supports a long output format when the
   16.61 +\path{-l} switch is used.  This outputs the fulls details of the
   16.62 +running domains in \xend's SXP configuration format.
   16.63 +
   16.64 +For example, suppose the system is running the ttylinux domain as
   16.65 +described earlier.  The list command should produce output somewhat
   16.66 +like the following:
   16.67 +\begin{verbatim}
   16.68 +# xm list
   16.69 +Name              Id  Mem(MB)  CPU  State  Time(s)  Console
   16.70 +Domain-0           0      251    0  r----    172.2        
   16.71 +ttylinux           5       63    0  -b---      3.0    9605
   16.72 +\end{verbatim}
   16.73 +
   16.74 +Here we can see the details for the ttylinux domain, as well as for
   16.75 +domain~0 (which, of course, is always running).  Note that the console
   16.76 +port for the ttylinux domain is 9605.  This can be connected to by TCP
   16.77 +using a terminal program (e.g. \path{telnet} or, better,
   16.78 +\path{xencons}).  The simplest way to connect is to use the
   16.79 +\path{xm~console} command, specifying the domain name or ID.  To
   16.80 +connect to the console of the ttylinux domain, we could use any of the
   16.81 +following:
   16.82 +\begin{verbatim}
   16.83 +# xm console ttylinux
   16.84 +# xm console 5
   16.85 +# xencons localhost 9605
   16.86 +\end{verbatim}
   16.87 +
   16.88 +\section{Domain Save and Restore}
   16.89 +
   16.90 +The administrator of a Xen system may suspend a virtual machine's
   16.91 +current state into a disk file in domain~0, allowing it to be resumed
   16.92 +at a later time.
   16.93 +
   16.94 +The ttylinux domain described earlier can be suspended to disk using
   16.95 +the command:
   16.96 +\begin{verbatim}
   16.97 +# xm save ttylinux ttylinux.xen
   16.98 +\end{verbatim}
   16.99 +
  16.100 +This will stop the domain named `ttylinux' and save its current state
  16.101 +into a file called \path{ttylinux.xen}.
  16.102 +
  16.103 +To resume execution of this domain, use the \path{xm restore} command:
  16.104 +\begin{verbatim}
  16.105 +# xm restore ttylinux.xen
  16.106 +\end{verbatim}
  16.107 +
  16.108 +This will restore the state of the domain and restart it.  The domain
  16.109 +will carry on as before and the console may be reconnected using the
  16.110 +\path{xm console} command, as above.
  16.111 +
  16.112 +\section{Live Migration}
  16.113 +
  16.114 +Live migration is used to transfer a domain between physical hosts
  16.115 +whilst that domain continues to perform its usual activities --- from
  16.116 +the user's perspective, the migration should be imperceptible.
  16.117 +
  16.118 +To perform a live migration, both hosts must be running Xen / \xend\
  16.119 +and the destination host must have sufficient resources (e.g.\ memory
  16.120 +capacity) to accommodate the domain after the move. Furthermore we
  16.121 +currently require both source and destination machines to be on the
  16.122 +same L2 subnet.
  16.123 +
  16.124 +Currently, there is no support for providing automatic remote access
  16.125 +to filesystems stored on local disk when a domain is migrated.
  16.126 +Administrators should choose an appropriate storage solution (i.e.\
  16.127 +SAN, NAS, etc.) to ensure that domain filesystems are also available
  16.128 +on their destination node. GNBD is a good method for exporting a
  16.129 +volume from one machine to another. iSCSI can do a similar job, but is
  16.130 +more complex to set up.
  16.131 +
  16.132 +When a domain migrates, it's MAC and IP address move with it, thus it
  16.133 +is only possible to migrate VMs within the same layer-2 network and IP
  16.134 +subnet. If the destination node is on a different subnet, the
  16.135 +administrator would need to manually configure a suitable etherip or
  16.136 +IP tunnel in the domain~0 of the remote node.
  16.137 +
  16.138 +A domain may be migrated using the \path{xm migrate} command.  To live
  16.139 +migrate a domain to another machine, we would use the command:
  16.140 +
  16.141 +\begin{verbatim}
  16.142 +# xm migrate --live mydomain destination.ournetwork.com
  16.143 +\end{verbatim}
  16.144 +
  16.145 +Without the \path{--live} flag, \xend\ simply stops the domain and
  16.146 +copies the memory image over to the new node and restarts it. Since
  16.147 +domains can have large allocations this can be quite time consuming,
  16.148 +even on a Gigabit network. With the \path{--live} flag \xend\ attempts
  16.149 +to keep the domain running while the migration is in progress,
  16.150 +resulting in typical `downtimes' of just 60--300ms.
  16.151 +
  16.152 +For now it will be necessary to reconnect to the domain's console on
  16.153 +the new machine using the \path{xm console} command.  If a migrated
  16.154 +domain has any open network connections then they will be preserved,
  16.155 +so SSH connections do not have this limitation.
  16.156 +
  16.157 +
  16.158 +\section{Managing Domain Memory}
  16.159 +
  16.160 +XenLinux domains have the ability to relinquish / reclaim machine
  16.161 +memory at the request of the administrator or the user of the domain.
  16.162 +
  16.163 +\subsection{Setting memory footprints from dom0}
  16.164 +
  16.165 +The machine administrator can request that a domain alter its memory
  16.166 +footprint using the \path{xm set-mem} command.  For instance, we can
  16.167 +request that our example ttylinux domain reduce its memory footprint
  16.168 +to 32 megabytes.
  16.169 +
  16.170 +\begin{verbatim}
  16.171 +# xm set-mem ttylinux 32
  16.172 +\end{verbatim}
  16.173 +
  16.174 +We can now see the result of this in the output of \path{xm list}:
  16.175 +
  16.176 +\begin{verbatim}
  16.177 +# xm list
  16.178 +Name              Id  Mem(MB)  CPU  State  Time(s)  Console
  16.179 +Domain-0           0      251    0  r----    172.2        
  16.180 +ttylinux           5       31    0  -b---      4.3    9605
  16.181 +\end{verbatim}
  16.182 +
  16.183 +The domain has responded to the request by returning memory to Xen. We
  16.184 +can restore the domain to its original size using the command line:
  16.185 +
  16.186 +\begin{verbatim}
  16.187 +# xm set-mem ttylinux 64
  16.188 +\end{verbatim}
  16.189 +
  16.190 +\subsection{Setting memory footprints from within a domain}
  16.191 +
  16.192 +The virtual file \path{/proc/xen/balloon} allows the owner of a domain
  16.193 +to adjust their own memory footprint.  Reading the file (e.g.\
  16.194 +\path{cat /proc/xen/balloon}) prints out the current memory footprint
  16.195 +of the domain.  Writing the file (e.g.\ \path{echo new\_target >
  16.196 +  /proc/xen/balloon}) requests that the kernel adjust the domain's
  16.197 +memory footprint to a new value.
  16.198 +
  16.199 +\subsection{Setting memory limits}
  16.200 +
  16.201 +Xen associates a memory size limit with each domain.  By default, this
  16.202 +is the amount of memory the domain is originally started with,
  16.203 +preventing the domain from ever growing beyond this size.  To permit a
  16.204 +domain to grow beyond its original allocation or to prevent a domain
  16.205 +you've shrunk from reclaiming the memory it relinquished, use the
  16.206 +\path{xm maxmem} command.
    17.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    17.2 +++ b/docs/src/user/glossary.tex	Tue Sep 20 09:17:33 2005 +0000
    17.3 @@ -0,0 +1,79 @@
    17.4 +\chapter{Glossary of Terms}
    17.5 +
    17.6 +\begin{description}
    17.7 +
    17.8 +\item[Atropos] One of the CPU schedulers provided by Xen.  Atropos
    17.9 +  provides domains with absolute shares of the CPU, with timeliness
   17.10 +  guarantees and a mechanism for sharing out `slack time'.
   17.11 +
   17.12 +\item[BVT] The BVT scheduler is used to give proportional fair shares
   17.13 +  of the CPU to domains.
   17.14 +
   17.15 +\item[Exokernel] A minimal piece of privileged code, similar to a {\bf
   17.16 +    microkernel} but providing a more `hardware-like' interface to the
   17.17 +  tasks it manages.  This is similar to a paravirtualising VMM like
   17.18 +  {\bf Xen} but was designed as a new operating system structure,
   17.19 +  rather than specifically to run multiple conventional OSs.
   17.20 +
   17.21 +\item[Domain] A domain is the execution context that contains a
   17.22 +  running {\bf virtual machine}.  The relationship between virtual
   17.23 +  machines and domains on Xen is similar to that between programs and
   17.24 +  processes in an operating system: a virtual machine is a persistent
   17.25 +  entity that resides on disk (somewhat like a program).  When it is
   17.26 +  loaded for execution, it runs in a domain.  Each domain has a {\bf
   17.27 +    domain ID}.
   17.28 +
   17.29 +\item[Domain 0] The first domain to be started on a Xen machine.
   17.30 +  Domain 0 is responsible for managing the system.
   17.31 +
   17.32 +\item[Domain ID] A unique identifier for a {\bf domain}, analogous to
   17.33 +  a process ID in an operating system.
   17.34 +
   17.35 +\item[Full virtualisation] An approach to virtualisation which
   17.36 +  requires no modifications to the hosted operating system, providing
   17.37 +  the illusion of a complete system of real hardware devices.
   17.38 +
   17.39 +\item[Hypervisor] An alternative term for {\bf VMM}, used because it
   17.40 +  means `beyond supervisor', since it is responsible for managing
   17.41 +  multiple `supervisor' kernels.
   17.42 +
   17.43 +\item[Live migration] A technique for moving a running virtual machine
   17.44 +  to another physical host, without stopping it or the services
   17.45 +  running on it.
   17.46 +
   17.47 +\item[Microkernel] A small base of code running at the highest
   17.48 +  hardware privilege level.  A microkernel is responsible for sharing
   17.49 +  CPU and memory (and sometimes other devices) between less privileged
   17.50 +  tasks running on the system.  This is similar to a VMM, particularly
   17.51 +  a {\bf paravirtualising} VMM but typically addressing a different
   17.52 +  problem space and providing different kind of interface.
   17.53 +
   17.54 +\item[NetBSD/Xen] A port of NetBSD to the Xen architecture.
   17.55 +
   17.56 +\item[Paravirtualisation] An approach to virtualisation which requires
   17.57 +  modifications to the operating system in order to run in a virtual
   17.58 +  machine.  Xen uses paravirtualisation but preserves binary
   17.59 +  compatibility for user space applications.
   17.60 +
   17.61 +\item[Shadow pagetables] A technique for hiding the layout of machine
   17.62 +  memory from a virtual machine's operating system.  Used in some {\bf
   17.63 +    VMMs} to provide the illusion of contiguous physical memory, in
   17.64 +  Xen this is used during {\bf live migration}.
   17.65 +
   17.66 +\item[Virtual Machine] The environment in which a hosted operating
   17.67 +  system runs, providing the abstraction of a dedicated machine.  A
   17.68 +  virtual machine may be identical to the underlying hardware (as in
   17.69 +  {\bf full virtualisation}, or it may differ, as in {\bf
   17.70 +    paravirtualisation}).
   17.71 +
   17.72 +\item[VMM] Virtual Machine Monitor - the software that allows multiple
   17.73 +  virtual machines to be multiplexed on a single physical machine.
   17.74 +
   17.75 +\item[Xen] Xen is a paravirtualising virtual machine monitor,
   17.76 +  developed primarily by the Systems Research Group at the University
   17.77 +  of Cambridge Computer Laboratory.
   17.78 +
   17.79 +\item[XenLinux] Official name for the port of the Linux kernel that
   17.80 +  runs on Xen.
   17.81 +
   17.82 +\end{description}
    18.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    18.2 +++ b/docs/src/user/installation.tex	Tue Sep 20 09:17:33 2005 +0000
    18.3 @@ -0,0 +1,394 @@
    18.4 +\chapter{Installation}
    18.5 +
    18.6 +The Xen distribution includes three main components: Xen itself, ports
    18.7 +of Linux 2.4 and 2.6 and NetBSD to run on Xen, and the userspace
    18.8 +tools required to manage a Xen-based system.  This chapter describes
    18.9 +how to install the Xen~2.0 distribution from source.  Alternatively,
   18.10 +there may be pre-built packages available as part of your operating
   18.11 +system distribution.
   18.12 +
   18.13 +
   18.14 +\section{Prerequisites}
   18.15 +\label{sec:prerequisites}
   18.16 +
   18.17 +The following is a full list of prerequisites.  Items marked `$\dag$'
   18.18 +are required by the \xend\ control tools, and hence required if you
   18.19 +want to run more than one virtual machine; items marked `$*$' are only
   18.20 +required if you wish to build from source.
   18.21 +\begin{itemize}
   18.22 +\item A working Linux distribution using the GRUB bootloader and
   18.23 +  running on a P6-class (or newer) CPU.
   18.24 +\item [$\dag$] The \path{iproute2} package.
   18.25 +\item [$\dag$] The Linux bridge-utils\footnote{Available from {\tt
   18.26 +      http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl})
   18.27 +\item [$\dag$] An installation of Twisted~v1.3 or
   18.28 +  above\footnote{Available from {\tt http://www.twistedmatrix.com}}.
   18.29 +  There may be a binary package available for your distribution;
   18.30 +  alternatively it can be installed by running `{\sl make
   18.31 +    install-twisted}' in the root of the Xen source tree.
   18.32 +\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make).
   18.33 +\item [$*$] Development installation of libcurl (e.g., libcurl-devel)
   18.34 +\item [$*$] Development installation of zlib (e.g., zlib-dev).
   18.35 +\item [$*$] Development installation of Python v2.2 or later (e.g.,
   18.36 +  python-dev).
   18.37 +\item [$*$] \LaTeX\ and transfig are required to build the
   18.38 +  documentation.
   18.39 +\end{itemize}
   18.40 +
   18.41 +Once you have satisfied the relevant prerequisites, you can now
   18.42 +install either a binary or source distribution of Xen.
   18.43 +
   18.44 +
   18.45 +\section{Installing from Binary Tarball}
   18.46 +
   18.47 +Pre-built tarballs are available for download from the Xen download
   18.48 +page
   18.49 +\begin{quote} {\tt http://xen.sf.net}
   18.50 +\end{quote}
   18.51 +
   18.52 +Once you've downloaded the tarball, simply unpack and install:
   18.53 +\begin{verbatim}
   18.54 +# tar zxvf xen-2.0-install.tgz
   18.55 +# cd xen-2.0-install
   18.56 +# sh ./install.sh
   18.57 +\end{verbatim}
   18.58 +
   18.59 +Once you've installed the binaries you need to configure your system
   18.60 +as described in Section~\ref{s:configure}.
   18.61 +
   18.62 +
   18.63 +\section{Installing from Source}
   18.64 +
   18.65 +This section describes how to obtain, build, and install Xen from
   18.66 +source.
   18.67 +
   18.68 +\subsection{Obtaining the Source}
   18.69 +
   18.70 +The Xen source tree is available as either a compressed source tar
   18.71 +ball or as a clone of our master BitKeeper repository.
   18.72 +
   18.73 +\begin{description}
   18.74 +\item[Obtaining the Source Tarball]\mbox{} \\
   18.75 +  Stable versions (and daily snapshots) of the Xen source tree are
   18.76 +  available as compressed tarballs from the Xen download page
   18.77 +  \begin{quote} {\tt http://xen.sf.net}
   18.78 +  \end{quote}
   18.79 +
   18.80 +\item[Using BitKeeper]\mbox{} \\
   18.81 +  If you wish to install Xen from a clone of our latest BitKeeper
   18.82 +  repository then you will need to install the BitKeeper tools.
   18.83 +  Download instructions for BitKeeper can be obtained by filling out
   18.84 +  the form at:
   18.85 +  \begin{quote} {\tt http://www.bitmover.com/cgi-bin/download.cgi}
   18.86 +\end{quote}
   18.87 +The public master BK repository for the 2.0 release lives at:
   18.88 +\begin{quote} {\tt bk://xen.bkbits.net/xen-2.0.bk}
   18.89 +\end{quote} 
   18.90 +You can use BitKeeper to download it and keep it updated with the
   18.91 +latest features and fixes.
   18.92 +
   18.93 +Change to the directory in which you want to put the source code, then
   18.94 +run:
   18.95 +\begin{verbatim}
   18.96 +# bk clone bk://xen.bkbits.net/xen-2.0.bk
   18.97 +\end{verbatim}
   18.98 +
   18.99 +Under your current directory, a new directory named \path{xen-2.0.bk}
  18.100 +has been created, which contains all the source code for Xen, the OS
  18.101 +ports, and the control tools. You can update your repository with the
  18.102 +latest changes at any time by running:
  18.103 +\begin{verbatim}
  18.104 +# cd xen-2.0.bk # to change into the local repository
  18.105 +# bk pull       # to update the repository
  18.106 +\end{verbatim}
  18.107 +\end{description}
  18.108 +
  18.109 +% \section{The distribution}
  18.110 +%
  18.111 +% The Xen source code repository is structured as follows:
  18.112 +%
  18.113 +% \begin{description}
  18.114 +% \item[\path{tools/}] Xen node controller daemon (Xend), command line
  18.115 +%   tools, control libraries
  18.116 +% \item[\path{xen/}] The Xen VMM.
  18.117 +% \item[\path{linux-*-xen-sparse/}] Xen support for Linux.
  18.118 +% \item[\path{linux-*-patches/}] Experimental patches for Linux.
  18.119 +% \item[\path{netbsd-*-xen-sparse/}] Xen support for NetBSD.
  18.120 +% \item[\path{docs/}] Various documentation files for users and
  18.121 +%   developers.
  18.122 +% \item[\path{extras/}] Bonus extras.
  18.123 +% \end{description}
  18.124 +
  18.125 +\subsection{Building from Source}
  18.126 +
  18.127 +The top-level Xen Makefile includes a target `world' that will do the
  18.128 +following:
  18.129 +
  18.130 +\begin{itemize}
  18.131 +\item Build Xen.
  18.132 +\item Build the control tools, including \xend.
  18.133 +\item Download (if necessary) and unpack the Linux 2.6 source code,
  18.134 +  and patch it for use with Xen.
  18.135 +\item Build a Linux kernel to use in domain 0 and a smaller
  18.136 +  unprivileged kernel, which can optionally be used for unprivileged
  18.137 +  virtual machines.
  18.138 +\end{itemize}
  18.139 +
  18.140 +After the build has completed you should have a top-level directory
  18.141 +called \path{dist/} in which all resulting targets will be placed; of
  18.142 +particular interest are the two kernels XenLinux kernel images, one
  18.143 +with a `-xen0' extension which contains hardware device drivers and
  18.144 +drivers for Xen's virtual devices, and one with a `-xenU' extension
  18.145 +that just contains the virtual ones. These are found in
  18.146 +\path{dist/install/boot/} along with the image for Xen itself and the
  18.147 +configuration files used during the build.
  18.148 +
  18.149 +The NetBSD port can be built using:
  18.150 +\begin{quote}
  18.151 +\begin{verbatim}
  18.152 +# make netbsd20
  18.153 +\end{verbatim}
  18.154 +\end{quote}
  18.155 +NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch.
  18.156 +The snapshot is downloaded as part of the build process, if it is not
  18.157 +yet present in the \path{NETBSD\_SRC\_PATH} search path.  The build
  18.158 +process also downloads a toolchain which includes all the tools
  18.159 +necessary to build the NetBSD kernel under Linux.
  18.160 +
  18.161 +To customize further the set of kernels built you need to edit the
  18.162 +top-level Makefile. Look for the line:
  18.163 +
  18.164 +\begin{quote}
  18.165 +\begin{verbatim}
  18.166 +KERNELS ?= mk.linux-2.6-xen0 mk.linux-2.6-xenU
  18.167 +\end{verbatim}
  18.168 +\end{quote}
  18.169 +
  18.170 +You can edit this line to include any set of operating system kernels
  18.171 +which have configurations in the top-level \path{buildconfigs/}
  18.172 +directory, for example \path{mk.linux-2.4-xenU} to build a Linux 2.4
  18.173 +kernel containing only virtual device drivers.
  18.174 +
  18.175 +%% Inspect the Makefile if you want to see what goes on during a
  18.176 +%% build.  Building Xen and the tools is straightforward, but XenLinux
  18.177 +%% is more complicated.  The makefile needs a `pristine' Linux kernel
  18.178 +%% tree to which it will then add the Xen architecture files.  You can
  18.179 +%% tell the makefile the location of the appropriate Linux compressed
  18.180 +%% tar file by
  18.181 +%% setting the LINUX\_SRC environment variable, e.g. \\
  18.182 +%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by
  18.183 +%% placing the tar file somewhere in the search path of {\tt
  18.184 +%%   LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'.  If the
  18.185 +%% makefile can't find a suitable kernel tar file it attempts to
  18.186 +%% download it from kernel.org (this won't work if you're behind a
  18.187 +%% firewall).
  18.188 +
  18.189 +%% After untaring the pristine kernel tree, the makefile uses the {\tt
  18.190 +%%   mkbuildtree} script to add the Xen patches to the kernel.
  18.191 +
  18.192 +
  18.193 +%% The procedure is similar to build the Linux 2.4 port: \\
  18.194 +%% \verb!# LINUX_SRC=/path/to/linux2.4/source make linux24!
  18.195 +
  18.196 +
  18.197 +%% \framebox{\parbox{5in}{
  18.198 +%%     {\bf Distro specific:} \\
  18.199 +%%     {\it Gentoo} --- if not using udev (most installations,
  18.200 +%%     currently), you'll need to enable devfs and devfs mount at boot
  18.201 +%%     time in the xen0 config.  }}
  18.202 +
  18.203 +\subsection{Custom XenLinux Builds}
  18.204 +
  18.205 +% If you have an SMP machine you may wish to give the {\tt '-j4'}
  18.206 +% argument to make to get a parallel build.
  18.207 +
  18.208 +If you wish to build a customized XenLinux kernel (e.g. to support
  18.209 +additional devices or enable distribution-required features), you can
  18.210 +use the standard Linux configuration mechanisms, specifying that the
  18.211 +architecture being built for is \path{xen}, e.g:
  18.212 +\begin{quote}
  18.213 +\begin{verbatim}
  18.214 +# cd linux-2.6.11-xen0
  18.215 +# make ARCH=xen xconfig
  18.216 +# cd ..
  18.217 +# make
  18.218 +\end{verbatim}
  18.219 +\end{quote}
  18.220 +
  18.221 +You can also copy an existing Linux configuration (\path{.config})
  18.222 +into \path{linux-2.6.11-xen0} and execute:
  18.223 +\begin{quote}
  18.224 +\begin{verbatim}
  18.225 +# make ARCH=xen oldconfig
  18.226 +\end{verbatim}
  18.227 +\end{quote}
  18.228 +
  18.229 +You may be prompted with some Xen-specific options; we advise
  18.230 +accepting the defaults for these options.
  18.231 +
  18.232 +Note that the only difference between the two types of Linux kernel
  18.233 +that are built is the configuration file used for each.  The `U'
  18.234 +suffixed (unprivileged) versions don't contain any of the physical
  18.235 +hardware device drivers, leading to a 30\% reduction in size; hence
  18.236 +you may prefer these for your non-privileged domains.  The `0'
  18.237 +suffixed privileged versions can be used to boot the system, as well
  18.238 +as in driver domains and unprivileged domains.
  18.239 +
  18.240 +\subsection{Installing the Binaries}
  18.241 +
  18.242 +The files produced by the build process are stored under the
  18.243 +\path{dist/install/} directory. To install them in their default
  18.244 +locations, do:
  18.245 +\begin{quote}
  18.246 +\begin{verbatim}
  18.247 +# make install
  18.248 +\end{verbatim}
  18.249 +\end{quote}
  18.250 +
  18.251 +Alternatively, users with special installation requirements may wish
  18.252 +to install them manually by copying the files to their appropriate
  18.253 +destinations.
  18.254 +
  18.255 +%% Files in \path{install/boot/} include:
  18.256 +%% \begin{itemize}
  18.257 +%% \item \path{install/boot/xen-2.0.gz} Link to the Xen 'kernel'
  18.258 +%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0
  18.259 +%%   XenLinux kernel
  18.260 +%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged
  18.261 +%%   XenLinux kernel
  18.262 +%% \end{itemize}
  18.263 +
  18.264 +The \path{dist/install/boot} directory will also contain the config
  18.265 +files used for building the XenLinux kernels, and also versions of Xen
  18.266 +and XenLinux kernels that contain debug symbols (\path{xen-syms-2.0.6}
  18.267 +and \path{vmlinux-syms-2.6.11.11-xen0}) which are essential for
  18.268 +interpreting crash dumps.  Retain these files as the developers may
  18.269 +wish to see them if you post on the mailing list.
  18.270 +
  18.271 +
  18.272 +\section{Configuration}
  18.273 +\label{s:configure}
  18.274 +
  18.275 +Once you have built and installed the Xen distribution, it is simple
  18.276 +to prepare the machine for booting and running Xen.
  18.277 +
  18.278 +\subsection{GRUB Configuration}
  18.279 +
  18.280 +An entry should be added to \path{grub.conf} (often found under
  18.281 +\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot.
  18.282 +This file is sometimes called \path{menu.lst}, depending on your
  18.283 +distribution.  The entry should look something like the following:
  18.284 +
  18.285 +{\small
  18.286 +\begin{verbatim}
  18.287 +title Xen 2.0 / XenLinux 2.6
  18.288 +  kernel /boot/xen-2.0.gz dom0_mem=131072
  18.289 +  module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0
  18.290 +\end{verbatim}
  18.291 +}
  18.292 +
  18.293 +The kernel line tells GRUB where to find Xen itself and what boot
  18.294 +parameters should be passed to it (in this case, setting domain 0's
  18.295 +memory allocation in kilobytes and the settings for the serial port).
  18.296 +For more details on the various Xen boot parameters see
  18.297 +Section~\ref{s:xboot}.
  18.298 +
  18.299 +The module line of the configuration describes the location of the
  18.300 +XenLinux kernel that Xen should start and the parameters that should
  18.301 +be passed to it (these are standard Linux parameters, identifying the
  18.302 +root device and specifying it be initially mounted read only and
  18.303 +instructing that console output be sent to the screen).  Some
  18.304 +distributions such as SuSE do not require the \path{ro} parameter.
  18.305 +
  18.306 +%% \framebox{\parbox{5in}{
  18.307 +%%     {\bf Distro specific:} \\
  18.308 +%%     {\it SuSE} --- Omit the {\tt ro} option from the XenLinux
  18.309 +%%     kernel command line, since the partition won't be remounted rw
  18.310 +%%     during boot.  }}
  18.311 +
  18.312 +
  18.313 +If you want to use an initrd, just add another \path{module} line to
  18.314 +the configuration, as usual:
  18.315 +
  18.316 +{\small
  18.317 +\begin{verbatim}
  18.318 +  module /boot/my_initrd.gz
  18.319 +\end{verbatim}
  18.320 +}
  18.321 +
  18.322 +As always when installing a new kernel, it is recommended that you do
  18.323 +not delete existing menu options from \path{menu.lst} --- you may want
  18.324 +to boot your old Linux kernel in future, particularly if you have
  18.325 +problems.
  18.326 +
  18.327 +\subsection{Serial Console (optional)}
  18.328 +
  18.329 +%% kernel /boot/xen-2.0.gz dom0_mem=131072 com1=115200,8n1
  18.330 +%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro
  18.331 +
  18.332 +
  18.333 +In order to configure Xen serial console output, it is necessary to
  18.334 +add an boot option to your GRUB config; e.g.\ replace the above kernel
  18.335 +line with:
  18.336 +\begin{quote}
  18.337 +{\small
  18.338 +\begin{verbatim}
  18.339 +   kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1
  18.340 +\end{verbatim}}
  18.341 +\end{quote}
  18.342 +
  18.343 +This configures Xen to output on COM1 at 115,200 baud, 8 data bits, 1
  18.344 +stop bit and no parity. Modify these parameters for your set up.
  18.345 +
  18.346 +One can also configure XenLinux to share the serial console; to
  18.347 +achieve this append ``\path{console=ttyS0}'' to your module line.
  18.348 +
  18.349 +If you wish to be able to log in over the XenLinux serial console it
  18.350 +is necessary to add a line into \path{/etc/inittab}, just as per
  18.351 +regular Linux. Simply add the line:
  18.352 +\begin{quote} {\small {\tt c:2345:respawn:/sbin/mingetty ttyS0}}
  18.353 +\end{quote}
  18.354 +
  18.355 +and you should be able to log in. Note that to successfully log in as
  18.356 +root over the serial line will require adding \path{ttyS0} to
  18.357 +\path{/etc/securetty} in most modern distributions.
  18.358 +
  18.359 +\subsection{TLS Libraries}
  18.360 +
  18.361 +Users of the XenLinux 2.6 kernel should disable Thread Local Storage
  18.362 +(e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before
  18.363 +attempting to run with a XenLinux kernel\footnote{If you boot without
  18.364 +  first disabling TLS, you will get a warning message during the boot
  18.365 +  process. In this case, simply perform the rename after the machine
  18.366 +  is up and then run \texttt{/sbin/ldconfig} to make it take effect.}.
  18.367 +You can always reenable it by restoring the directory to its original
  18.368 +location (i.e.\ \path{mv /lib/tls.disabled /lib/tls}).
  18.369 +
  18.370 +The reason for this is that the current TLS implementation uses
  18.371 +segmentation in a way that is not permissible under Xen.  If TLS is
  18.372 +not disabled, an emulation mode is used within Xen which reduces
  18.373 +performance substantially.
  18.374 +
  18.375 +We hope that this issue can be resolved by working with Linux
  18.376 +distribution vendors to implement a minor backward-compatible change
  18.377 +to the TLS library.
  18.378 +
  18.379 +
  18.380 +\section{Booting Xen}
  18.381 +
  18.382 +It should now be possible to restart the system and use Xen.  Reboot
  18.383 +as usual but choose the new Xen option when the Grub screen appears.
  18.384 +
  18.385 +What follows should look much like a conventional Linux boot.  The
  18.386 +first portion of the output comes from Xen itself, supplying low level
  18.387 +information about itself and the machine it is running on.  The
  18.388 +following portion of the output comes from XenLinux.
  18.389 +
  18.390 +You may see some errors during the XenLinux boot.  These are not
  18.391 +necessarily anything to worry about --- they may result from kernel
  18.392 +configuration differences between your XenLinux kernel and the one you
  18.393 +usually use.
  18.394 +
  18.395 +When the boot completes, you should be able to log into your system as
  18.396 +usual.  If you are unable to log in to your system running Xen, you
  18.397 +should still be able to reboot with your normal Linux kernel.
    19.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    19.2 +++ b/docs/src/user/introduction.tex	Tue Sep 20 09:17:33 2005 +0000
    19.3 @@ -0,0 +1,143 @@
    19.4 +\chapter{Introduction}
    19.5 +
    19.6 +
    19.7 +Xen is a \emph{paravirtualising} virtual machine monitor (VMM), or
    19.8 +`hypervisor', for the x86 processor architecture.  Xen can securely
    19.9 +execute multiple virtual machines on a single physical system with
   19.10 +close-to-native performance.  The virtual machine technology
   19.11 +facilitates enterprise-grade functionality, including:
   19.12 +
   19.13 +\begin{itemize}
   19.14 +\item Virtual machines with performance close to native hardware.
   19.15 +\item Live migration of running virtual machines between physical
   19.16 +  hosts.
   19.17 +\item Excellent hardware support (supports most Linux device drivers).
   19.18 +\item Sandboxed, re-startable device drivers.
   19.19 +\end{itemize}
   19.20 +
   19.21 +Paravirtualisation permits very high performance virtualisation, even
   19.22 +on architectures like x86 that are traditionally very hard to
   19.23 +virtualise.
   19.24 +
   19.25 +The drawback of this approach is that it requires operating systems to
   19.26 +be \emph{ported} to run on Xen.  Porting an OS to run on Xen is
   19.27 +similar to supporting a new hardware platform, however the process is
   19.28 +simplified because the paravirtual machine architecture is very
   19.29 +similar to the underlying native hardware. Even though operating
   19.30 +system kernels must explicitly support Xen, a key feature is that user
   19.31 +space applications and libraries \emph{do not} require modification.
   19.32 +
   19.33 +Xen support is available for increasingly many operating systems:
   19.34 +right now, Linux 2.4, Linux 2.6 and NetBSD are available for Xen 2.0.
   19.35 +A FreeBSD port is undergoing testing and will be incorporated into the
   19.36 +release soon. Other OS ports, including Plan 9, are in progress.  We
   19.37 +hope that that arch-xen patches will be incorporated into the
   19.38 +mainstream releases of these operating systems in due course (as has
   19.39 +already happened for NetBSD).
   19.40 +
   19.41 +Possible usage scenarios for Xen include:
   19.42 +
   19.43 +\begin{description}
   19.44 +\item [Kernel development.] Test and debug kernel modifications in a
   19.45 +  sandboxed virtual machine --- no need for a separate test machine.
   19.46 +\item [Multiple OS configurations.] Run multiple operating systems
   19.47 +  simultaneously, for instance for compatibility or QA purposes.
   19.48 +\item [Server consolidation.] Move multiple servers onto a single
   19.49 +  physical host with performance and fault isolation provided at
   19.50 +  virtual machine boundaries.
   19.51 +\item [Cluster computing.] Management at VM granularity provides more
   19.52 +  flexibility than separately managing each physical host, but better
   19.53 +  control and isolation than single-system image solutions,
   19.54 +  particularly by using live migration for load balancing.
   19.55 +\item [Hardware support for custom OSes.] Allow development of new
   19.56 +  OSes while benefiting from the wide-ranging hardware support of
   19.57 +  existing OSes such as Linux.
   19.58 +\end{description}
   19.59 +
   19.60 +
   19.61 +\section{Structure of a Xen-Based System}
   19.62 +
   19.63 +A Xen system has multiple layers, the lowest and most privileged of
   19.64 +which is Xen itself. 
   19.65 +
   19.66 +Xen in turn may host multiple \emph{guest} operating systems, each of
   19.67 +which is executed within a secure virtual machine (in Xen terminology,
   19.68 +a \emph{domain}). Domains are scheduled by Xen to make effective use
   19.69 +of the available physical CPUs.  Each guest OS manages its own
   19.70 +applications, which includes responsibility for scheduling each
   19.71 +application within the time allotted to the VM by Xen.
   19.72 +
   19.73 +The first domain, \emph{domain 0}, is created automatically when the
   19.74 +system boots and has special management privileges. Domain 0 builds
   19.75 +other domains and manages their virtual devices. It also performs
   19.76 +administrative tasks such as suspending, resuming and migrating other
   19.77 +virtual machines.
   19.78 +
   19.79 +Within domain 0, a process called \emph{xend} runs to manage the
   19.80 +system.  \Xend is responsible for managing virtual machines and
   19.81 +providing access to their consoles.  Commands are issued to \xend over
   19.82 +an HTTP interface, either from a command-line tool or from a web
   19.83 +browser.
   19.84 +
   19.85 +
   19.86 +\section{Hardware Support}
   19.87 +
   19.88 +Xen currently runs only on the x86 architecture, requiring a `P6' or
   19.89 +newer processor (e.g. Pentium Pro, Celeron, Pentium II, Pentium III,
   19.90 +Pentium IV, Xeon, AMD Athlon, AMD Duron).  Multiprocessor machines are
   19.91 +supported, and we also have basic support for HyperThreading (SMT),
   19.92 +although this remains a topic for ongoing research. A port
   19.93 +specifically for x86/64 is in progress, although Xen already runs on
   19.94 +such systems in 32-bit legacy mode. In addition a port to the IA64
   19.95 +architecture is approaching completion. We hope to add other
   19.96 +architectures such as PPC and ARM in due course.
   19.97 +
   19.98 +Xen can currently use up to 4GB of memory.  It is possible for x86
   19.99 +machines to address up to 64GB of physical memory but there are no
  19.100 +current plans to support these systems: The x86/64 port is the planned
  19.101 +route to supporting larger memory sizes.
  19.102 +
  19.103 +Xen offloads most of the hardware support issues to the guest OS
  19.104 +running in Domain~0.  Xen itself contains only the code required to
  19.105 +detect and start secondary processors, set up interrupt routing, and
  19.106 +perform PCI bus enumeration.  Device drivers run within a privileged
  19.107 +guest OS rather than within Xen itself. This approach provides
  19.108 +compatibility with the majority of device hardware supported by Linux.
  19.109 +The default XenLinux build contains support for relatively modern
  19.110 +server-class network and disk hardware, but you can add support for
  19.111 +other hardware by configuring your XenLinux kernel in the normal way.
  19.112 +
  19.113 +
  19.114 +\section{History}
  19.115 +
  19.116 +Xen was originally developed by the Systems Research Group at the
  19.117 +University of Cambridge Computer Laboratory as part of the XenoServers
  19.118 +project, funded by the UK-EPSRC.
  19.119 +
  19.120 +XenoServers aim to provide a `public infrastructure for global
  19.121 +distributed computing', and Xen plays a key part in that, allowing us
  19.122 +to efficiently partition a single machine to enable multiple
  19.123 +independent clients to run their operating systems and applications in
  19.124 +an environment providing protection, resource isolation and
  19.125 +accounting.  The project web page contains further information along
  19.126 +with pointers to papers and technical reports:
  19.127 +\path{http://www.cl.cam.ac.uk/xeno}
  19.128 +
  19.129 +Xen has since grown into a fully-fledged project in its own right,
  19.130 +enabling us to investigate interesting research issues regarding the
  19.131 +best techniques for virtualising resources such as the CPU, memory,
  19.132 +disk and network.  The project has been bolstered by support from
  19.133 +Intel Research Cambridge, and HP Labs, who are now working closely
  19.134 +with us.
  19.135 +
  19.136 +Xen was first described in a paper presented at SOSP in
  19.137 +2003\footnote{\tt
  19.138 +  http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the
  19.139 +first public release (1.0) was made that October.  Since then, Xen has
  19.140 +significantly matured and is now used in production scenarios on many
  19.141 +sites.
  19.142 +
  19.143 +Xen 2.0 features greatly enhanced hardware support, configuration
  19.144 +flexibility, usability and a larger complement of supported operating
  19.145 +systems. This latest release takes Xen a step closer to becoming the
  19.146 +definitive open source solution for virtualisation.
    20.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    20.2 +++ b/docs/src/user/redhat.tex	Tue Sep 20 09:17:33 2005 +0000
    20.3 @@ -0,0 +1,61 @@
    20.4 +\chapter{Installing Xen / XenLinux on Red~Hat or Fedora Core}
    20.5 +
    20.6 +When using Xen / XenLinux on a standard Linux distribution there are a
    20.7 +couple of things to watch out for:
    20.8 +
    20.9 +Note that, because domains greater than 0 don't have any privileged
   20.10 +access at all, certain commands in the default boot sequence will fail
   20.11 +e.g.\ attempts to update the hwclock, change the console font, update
   20.12 +the keytable map, start apmd (power management), or gpm (mouse
   20.13 +cursor).  Either ignore the errors (they should be harmless), or
   20.14 +remove them from the startup scripts.  Deleting the following links
   20.15 +are a good start: {\path{S24pcmcia}}, {\path{S09isdn}},
   20.16 +{\path{S17keytable}}, {\path{S26apmd}}, {\path{S85gpm}}.
   20.17 +
   20.18 +If you want to use a single root file system that works cleanly for
   20.19 +both domain~0 and unprivileged domains, a useful trick is to use
   20.20 +different `init' run levels. For example, use run level 3 for
   20.21 +domain~0, and run level 4 for other domains. This enables different
   20.22 +startup scripts to be run in depending on the run level number passed
   20.23 +on the kernel command line.
   20.24 +
   20.25 +If using NFS root files systems mounted either from an external server
   20.26 +or from domain0 there are a couple of other gotchas.  The default
   20.27 +{\path{/etc/sysconfig/iptables}} rules block NFS, so part way through
   20.28 +the boot sequence things will suddenly go dead.
   20.29 +
   20.30 +If you're planning on having a separate NFS {\path{/usr}} partition,
   20.31 +the RH9 boot scripts don't make life easy - they attempt to mount NFS
   20.32 +file systems way to late in the boot process. The easiest way I found
   20.33 +to do this was to have a {\path{/linuxrc}} script run ahead of
   20.34 +{\path{/sbin/init}} that mounts {\path{/usr}}:
   20.35 +
   20.36 +\begin{quote}
   20.37 +  \begin{small}\begin{verbatim}
   20.38 + #!/bin/bash
   20.39 + /sbin/ipconfig lo 127.0.0.1
   20.40 + /sbin/portmap
   20.41 + /bin/mount /usr
   20.42 + exec /sbin/init "$@" <>/dev/console 2>&1
   20.43 +\end{verbatim}\end{small}
   20.44 +\end{quote}
   20.45 +
   20.46 +%% $ XXX SMH: font lock fix :-)
   20.47 +
   20.48 +The one slight complication with the above is that
   20.49 +{\path{/sbin/portmap}} is dynamically linked against
   20.50 +{\path{/usr/lib/libwrap.so.0}} Since this is in {\path{/usr}}, it
   20.51 +won't work. This can be solved by copying the file (and link) below
   20.52 +the {\path{/usr}} mount point, and just let the file be `covered' when
   20.53 +the mount happens.
   20.54 +
   20.55 +In some installations, where a shared read-only {\path{/usr}} is being
   20.56 +used, it may be desirable to move other large directories over into
   20.57 +the read-only {\path{/usr}}. For example, you might replace
   20.58 +{\path{/bin}}, {\path{/lib}} and {\path{/sbin}} with links into
   20.59 +{\path{/usr/root/bin}}, {\path{/usr/root/lib}} and
   20.60 +{\path{/usr/root/sbin}} respectively. This creates other problems for
   20.61 +running the {\path{/linuxrc}} script, requiring bash, portmap, mount,
   20.62 +ifconfig, and a handful of other shared libraries to be copied below
   20.63 +the mount point --- a simple statically-linked C program would solve
   20.64 +this problem.
    21.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    21.2 +++ b/docs/src/user/start_addl_dom.tex	Tue Sep 20 09:17:33 2005 +0000
    21.3 @@ -0,0 +1,172 @@
    21.4 +\chapter{Starting Additional Domains}
    21.5 +
    21.6 +The first step in creating a new domain is to prepare a root
    21.7 +filesystem for it to boot from.  Typically, this might be stored in a
    21.8 +normal partition, an LVM or other volume manager partition, a disk
    21.9 +file or on an NFS server.  A simple way to do this is simply to boot
   21.10 +from your standard OS install CD and install the distribution into
   21.11 +another partition on your hard drive.
   21.12 +
   21.13 +To start the \xend\ control daemon, type
   21.14 +\begin{quote}
   21.15 +  \verb!# xend start!
   21.16 +\end{quote}
   21.17 +
   21.18 +If you wish the daemon to start automatically, see the instructions in
   21.19 +Section~\ref{s:xend}. Once the daemon is running, you can use the
   21.20 +\path{xm} tool to monitor and maintain the domains running on your
   21.21 +system. This chapter provides only a brief tutorial. We provide full
   21.22 +details of the \path{xm} tool in the next chapter.
   21.23 +
   21.24 +% \section{From the web interface}
   21.25 +%
   21.26 +% Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv}
   21.27 +% for more details) using the command: \\
   21.28 +% \verb_# xensv start_ \\
   21.29 +% This will also start Xend (see Chapter~\ref{cha:xend} for more
   21.30 +% information).
   21.31 +%
   21.32 +% The domain management interface will then be available at {\tt
   21.33 +%   http://your\_machine:8080/}.  This provides a user friendly wizard
   21.34 +% for starting domains and functions for managing running domains.
   21.35 +%
   21.36 +% \section{From the command line}
   21.37 +
   21.38 +
   21.39 +\section{Creating a Domain Configuration File}
   21.40 +
   21.41 +Before you can start an additional domain, you must create a
   21.42 +configuration file. We provide two example files which you can use as
   21.43 +a starting point:
   21.44 +\begin{itemize}
   21.45 +\item \path{/etc/xen/xmexample1} is a simple template configuration
   21.46 +  file for describing a single VM.
   21.47 +
   21.48 +\item \path{/etc/xen/xmexample2} file is a template description that
   21.49 +  is intended to be reused for multiple virtual machines.  Setting the
   21.50 +  value of the \path{vmid} variable on the \path{xm} command line
   21.51 +  fills in parts of this template.
   21.52 +\end{itemize}
   21.53 +
   21.54 +Copy one of these files and edit it as appropriate.  Typical values
   21.55 +you may wish to edit include:
   21.56 +
   21.57 +\begin{quote}
   21.58 +\begin{description}
   21.59 +\item[kernel] Set this to the path of the kernel you compiled for use
   21.60 +  with Xen (e.g.\ \path{kernel = `/boot/vmlinuz-2.6-xenU'})
   21.61 +\item[memory] Set this to the size of the domain's memory in megabytes
   21.62 +  (e.g.\ \path{memory = 64})
   21.63 +\item[disk] Set the first entry in this list to calculate the offset
   21.64 +  of the domain's root partition, based on the domain ID.  Set the
   21.65 +  second to the location of \path{/usr} if you are sharing it between
   21.66 +  domains (e.g.\ \path{disk = [`phy:your\_hard\_drive\%d,sda1,w' \%
   21.67 +    (base\_partition\_number + vmid),
   21.68 +    `phy:your\_usr\_partition,sda6,r' ]}
   21.69 +\item[dhcp] Uncomment the dhcp variable, so that the domain will
   21.70 +  receive its IP address from a DHCP server (e.g.\ \path{dhcp=`dhcp'})
   21.71 +\end{description}
   21.72 +\end{quote}
   21.73 +
   21.74 +You may also want to edit the {\bf vif} variable in order to choose
   21.75 +the MAC address of the virtual ethernet interface yourself.  For
   21.76 +example:
   21.77 +\begin{quote}
   21.78 +\verb_vif = [`mac=00:06:AA:F6:BB:B3']_
   21.79 +\end{quote}
   21.80 +If you do not set this variable, \xend\ will automatically generate a
   21.81 +random MAC address from an unused range.
   21.82 +
   21.83 +
   21.84 +\section{Booting the Domain}
   21.85 +
   21.86 +The \path{xm} tool provides a variety of commands for managing
   21.87 +domains.  Use the \path{create} command to start new domains. Assuming
   21.88 +you've created a configuration file \path{myvmconf} based around
   21.89 +\path{/etc/xen/xmexample2}, to start a domain with virtual machine
   21.90 +ID~1 you should type:
   21.91 +
   21.92 +\begin{quote}
   21.93 +\begin{verbatim}
   21.94 +# xm create -c myvmconf vmid=1
   21.95 +\end{verbatim}
   21.96 +\end{quote}
   21.97 +
   21.98 +The \path{-c} switch causes \path{xm} to turn into the domain's
   21.99 +console after creation.  The \path{vmid=1} sets the \path{vmid}
  21.100 +variable used in the \path{myvmconf} file.
  21.101 +
  21.102 +You should see the console boot messages from the new domain appearing
  21.103 +in the terminal in which you typed the command, culminating in a login
  21.104 +prompt.
  21.105 +
  21.106 +
  21.107 +\section{Example: ttylinux}
  21.108 +
  21.109 +Ttylinux is a very small Linux distribution, designed to require very
  21.110 +few resources.  We will use it as a concrete example of how to start a
  21.111 +Xen domain.  Most users will probably want to install a full-featured
  21.112 +distribution once they have mastered the basics\footnote{ttylinux is
  21.113 +  maintained by Pascal Schmidt. You can download source packages from
  21.114 +  the distribution's home page: {\tt
  21.115 +    http://www.minimalinux.org/ttylinux/}}.
  21.116 +
  21.117 +\begin{enumerate}
  21.118 +\item Download and extract the ttylinux disk image from the Files
  21.119 +  section of the project's SourceForge site (see
  21.120 +  \path{http://sf.net/projects/xen/}).
  21.121 +\item Create a configuration file like the following:
  21.122 +\begin{verbatim}
  21.123 +kernel = "/boot/vmlinuz-2.6-xenU"
  21.124 +memory = 64
  21.125 +name = "ttylinux"
  21.126 +nics = 1
  21.127 +ip = "1.2.3.4"
  21.128 +disk = ['file:/path/to/ttylinux/rootfs,sda1,w']
  21.129 +root = "/dev/sda1 ro"
  21.130 +\end{verbatim}
  21.131 +\item Now start the domain and connect to its console:
  21.132 +\begin{verbatim}
  21.133 +xm create configfile -c
  21.134 +\end{verbatim}
  21.135 +\item Login as root, password root.
  21.136 +\end{enumerate}
  21.137 +
  21.138 +
  21.139 +\section{Starting / Stopping Domains Automatically}
  21.140 +
  21.141 +It is possible to have certain domains start automatically at boot
  21.142 +time and to have dom0 wait for all running domains to shutdown before
  21.143 +it shuts down the system.
  21.144 +
  21.145 +To specify a domain is to start at boot-time, place its configuration
  21.146 +file (or a link to it) under \path{/etc/xen/auto/}.
  21.147 +
  21.148 +A Sys-V style init script for Red Hat and LSB-compliant systems is
  21.149 +provided and will be automatically copied to \path{/etc/init.d/}
  21.150 +during install.  You can then enable it in the appropriate way for
  21.151 +your distribution.
  21.152 +
  21.153 +For instance, on Red Hat:
  21.154 +
  21.155 +\begin{quote}
  21.156 +  \verb_# chkconfig --add xendomains_
  21.157 +\end{quote}
  21.158 +
  21.159 +By default, this will start the boot-time domains in runlevels 3, 4
  21.160 +and 5.
  21.161 +
  21.162 +You can also use the \path{service} command to run this script
  21.163 +manually, e.g:
  21.164 +
  21.165 +\begin{quote}
  21.166 +  \verb_# service xendomains start_
  21.167 +
  21.168 +  Starts all the domains with config files under /etc/xen/auto/.
  21.169 +\end{quote}
  21.170 +
  21.171 +\begin{quote}
  21.172 +  \verb_# service xendomains stop_
  21.173 +
  21.174 +  Shuts down ALL running Xen domains.
  21.175 +\end{quote}