view README @ 717:0b04d8a47390

bitkeeper revision 1.425 (3f606cc1o_4A4klDuXS7zt2NJS08Ow)

Merge labyrinth.cl.cam.ac.uk:/auto/groups/xeno/BK/xeno.bk
into labyrinth.cl.cam.ac.uk:/auto/anfs/scratch/labyrinth/iap10/xeno-clone/xeno.bk
author iap10@labyrinth.cl.cam.ac.uk
date Thu Sep 11 12:38:25 2003 +0000 (2003-09-11)
parents f6f8ca2fa8d4
children afe5451d9f97
line source
1 #############################
2 __ __ _ ___
3 \ \/ /___ _ __ / | / _ \
4 \ // _ \ '_ \ | || | | |
5 / \ __/ | | | | || |_| |
6 /_/\_\___|_| |_| |_(_)___/
8 #############################
10 University of Cambridge Computer Laboratory
11 31 Aug 2003
13 http://www.cl.cam.ac.uk/netos/xen
15 About the Xen Virtual Machine Monitor
16 =====================================
18 "Xen" is a Virtual Machine Monitor (VMM) developed by the Systems
19 Research Group of the University of Cambridge Computer Laboratory, as
20 part of the UK-EPSRC funded XenoServers project.
22 The XenoServers project aims to provide a "public infrastructure for
23 global distributed computing", and Xen plays a key part in that,
24 allowing us to efficiently partition a single machine to enable
25 multiple independent clients to run their operating systems and
26 applications in an environment providing protection, resource
27 isolation and accounting. The project web page contains further
28 information along with pointers to papers and technical reports:
29 http://www.cl.cam.ac.uk/xeno
31 Xen has since grown into a project in its own right, enabling us to
32 investigate interesting research issues regarding the best techniques
33 for virtualizing resources such as the CPU, memory, disk and network.
34 The project has been bolstered by support from Intel Research
35 Cambridge, who are now working closely with us. We've now also
36 received support from Microsoft Research Cambridge to port Windows XP
37 to run on Xen.
39 Xen enables multiple operating system images to be run simultaneously
40 on the same hardware with very low performance overhead --- much lower
41 than commercial offerings on the same x86 platform.
43 This is achieved by requiring OSs to be specifically ported to run on
44 Xen, rather than allowing unmodified OS images to be used. Crucially,
45 only the OS needs to be changed -- all of the user-level application
46 binaries, libraries etc can run unmodified. Hence, the modified OS
47 kernel can typically just be dropped into any existing OS distribution
48 or installation.
50 Xen currently runs on the x86 architecture, but could in principle be
51 ported to other CPUs. In fact, it would have been rather easier to
52 write Xen for pretty much any other architecture as x86 doesn't do us
53 any favours at all. The best description of Xen's deign,
54 implementation and performance is contained in our October 2003 SOSP
55 paper: http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf
57 We have been working on porting 3 different operating systems to run
58 on Xen: Linux 2.4, Windows XP, and NetBSD.
60 The Linux 2.4 port (currently Linux 2.4.22) works very well -- we
61 regularly use it to host complex applications such as PostgreSQL,
62 Apache, BK servers etc. It runs all applications we've tried. We
63 refer to our version of Linux ported to run on Xen as "XenoLinux",
64 through really it's just standard Linux ported to a new virtual CPU
65 architecture that we call xeno-x86 (abbreviated to just "xeno").
67 Unfortunately, the NetBSD port has stalled due to lack of man
68 power. We believe most of the hard stuff has already been done, and
69 are hoping to get the ball rolling again soon. In hindsight, a FreeBSD
70 4 port might have been more useful to the community.
72 The Windows XP port is nearly finished. It's running user space
73 applications and is generally in pretty good shape thanks to some hard
74 work by the team over the summer. Of course, there are issues with
75 releasing this code to others. We should be able to release the
76 source and binaries to anyone else that's signed the Microsoft
77 academic source license, which these days has very reasonable
78 terms. We are in discussions with Microsoft about the possibility of
79 being able to make binary releases to a larger user
80 community. Obviously, there are issues with product activation in this
81 environment and such like, which need to be thought through.
83 So, for the moment, you only get to run multiple copies of Linux on
84 Xen, but we hope this will change before too long. Even running
85 multiple copies of the same OS can be very useful, as it provides a
86 means of containing faults to one OS image, and also for providing
87 performance isolation between the various OS, enabling you to either
88 restrict, or reserve resources for, particular VM instances.
90 Its also useful for development -- each version of Linux can have
91 different patches applied, enabling different kernels to be tried
92 out. For example, the "vservers" patch used by PlanetLab applies
93 cleanly to our ported version of Linux.
95 We've successfully booted over 128 copies of Linux on the same machine
96 (a dual CPU hyperthreaded Xeon box) but we imagine that it would be
97 more normal to use some smaller number, perhaps 10-20.
99 Known limitations and work in progress
100 ======================================
102 The "xenctl" tool is still rather clunky and not very user
103 friendly. In particular, it should have an option to create and start
104 a domain with all the necessary parameters set from a named xml file.
106 The java xenctl tool is really just a frontend for a bunch of C tools
107 named xi_* that do the actual work of talking to Xen and setting stuff
108 up. Some local users prefer to drive the xi_ tools directly, typically
109 from simple shell scripts. These tools are even less user friendly
110 than xenctl but its arguably clearer what's going on.
112 There's also a web based interface for controlling domains that uses
113 apache/tomcat, but it has fallen out of sync with respect to the
114 underlying tools, so doesn't always work as expected and needs to be
115 fixed.
117 The current Virtual Firewall Router (VFR) implementation in the
118 snapshot tree is very rudimentary, and in particular, lacks the IP
119 port-space sharing across domains that we've proposed that promises to
120 provide a better alternative to NAT. There's a complete new
121 implementation under development which also supports much better
122 logging and auditing support. The current network scheduler is just
123 simple round-robin between domains, without any rate limiting or rate
124 guarantees. Dropping in a new scheduler should be straightforward, and
125 is planned as part of the VFRv2 work package.
127 Another area that needs further work is the interface between Xen and
128 domain0 user space where the various XenoServer control daemons run.
129 The current interface is somewhat ad-hoc, making use of various
130 /proc/xeno entries that take a random assortment of arguments. We
131 intend to reimplement this to provide a consistent means of feeding
132 back accounting and logging information to the control daemon.
134 There's also a number of memory management hacks that didn't make this
135 release: We have plans for a "universal buffer cache" that enables
136 otherwise unused system memory to be used by domains in a read-only
137 fashion. We also have plans for inter-domain shared-memory to enable
138 high-performance bulk transport for cases where the usual internal
139 networking performance isn't good enough (e.g. communication with a
140 internal file server on another domain).
142 We also have plans to implement domain suspend/resume-to-file. This is
143 basically an extension to the current domain building process to
144 enable domain0 to read out all of the domain's state and store it in a
145 file. There are complications here due to Xen's para-virtualised
146 design, whereby since the physical machine memory pages available to
147 the guest OS are likely to be different when the OS is resumed, we
148 need to re-write the page tables appropriately.
150 We have the equivalent of balloon driver functionality to control
151 domain's memory usage, enabling a domain to give back unused pages to
152 Xen. This needs properly documenting, and perhaps a way of domain0
153 signalling to a domain that it requires it to reduce its memory
154 footprint, rather than just the domain volunteering.
156 The current disk scheduler is rather simplistic (batch round robin),
157 and could be replaced by e.g. Cello if we have QoS isolation
158 problems. For most things it seems to work OK, but there's currently
159 no service differentiation or weighting.
161 Currently, although Xen runs on SMP and SMT (hyperthreaded) machines,
162 the scheduling is far from smart -- domains are currently statically
163 assigned to a CPU when they are created (in a round robin fashion).
164 The scheduler needs to be modified such that before going idle a
165 logical CPU looks for work on other run queues (particularly on the
166 same physical CPU).
168 Xen currently only supports uniprocessor guest OSes. We have designed
169 the Xen interface with MP guests in mind, and plan to build an MP
170 Linux guest in due course. Basically, an MP guest would consist of
171 multiple scheduling domains (one per CPU) sharing a single memory
172 protection domain. The only extra complexity for the Xen VM system is
173 ensuring that when a page transitions from holding a page table or
174 page directory to a write-able page, we must ensure that no other CPU
175 still has the page in its TLB to ensure memory system integrity. One
176 other issue for supporting MP guests is that we'll need some sort of
177 CPU gang scheduler, which will require some research.
180 Hardware support
181 ================
183 Xen is intended to be run on server-class machines, and the current
184 list of supported hardware very much reflects this, avoiding the need
185 for us to write drivers for "legacy" hardware.
187 Xen requires a "P6" or newer processor (e.g. Pentium Pro, Celeron,
188 Pentium II, Pentium III, Pentium IV, Xeon, AMD Athlon, AMD Duron).
189 Multiprocessor machines are supported, and we also have basic support
190 for HyperThreading (SMT), though this remains a topic for ongoing
191 research. We're also looking at an AMD x86_64 port (though it should
192 run on Opterons in 32 bit mode just fine).
194 Xen can currently use up to 4GB of memory. Its possible for x86
195 machines to address more than that (64GB), but it requires using a
196 different page table format (3-level rather than 2-level) that we
197 currently don't support. Adding 3-level PAE support wouldn't be
198 difficult, but we'd also need to add support to all the guest
199 OSs. Volunteers welcome!
201 We currently support a relative modern set of network cards: Intel
202 e1000, Broadcom BCM 57xx (tg3), 3COM 3c905 (3c59x). Adding support for
203 other NICs that support hardware DMA scatter/gather from half-word
204 aligned addresses is relatively straight forward, by porting the
205 equivalent Linux driver. Drivers for a number of other older cards
206 have recently been added [pcnet32, e100, tulip], but are untested and
207 not recommended.
210 Ian Pratt
211 9 Sep 2003