ia64/xen-unstable

annotate README @ 706:f6f8ca2fa8d4

bitkeeper revision 1.419 (3f5ef5a4mQpbOFAoUevuy5GY5BPNKA)

Add READMEs, along with the xen-clone script, which is now far less
site-specific.
author iap10@labyrinth.cl.cam.ac.uk
date Wed Sep 10 09:57:56 2003 +0000 (2003-09-10)
parents
children afe5451d9f97
rev   line source
iap10@706 1 #############################
iap10@706 2 __ __ _ ___
iap10@706 3 \ \/ /___ _ __ / | / _ \
iap10@706 4 \ // _ \ '_ \ | || | | |
iap10@706 5 / \ __/ | | | | || |_| |
iap10@706 6 /_/\_\___|_| |_| |_(_)___/
iap10@706 7
iap10@706 8 #############################
iap10@706 9
iap10@706 10 University of Cambridge Computer Laboratory
iap10@706 11 31 Aug 2003
iap10@706 12
iap10@706 13 http://www.cl.cam.ac.uk/netos/xen
iap10@706 14
iap10@706 15 About the Xen Virtual Machine Monitor
iap10@706 16 =====================================
iap10@706 17
iap10@706 18 "Xen" is a Virtual Machine Monitor (VMM) developed by the Systems
iap10@706 19 Research Group of the University of Cambridge Computer Laboratory, as
iap10@706 20 part of the UK-EPSRC funded XenoServers project.
iap10@706 21
iap10@706 22 The XenoServers project aims to provide a "public infrastructure for
iap10@706 23 global distributed computing", and Xen plays a key part in that,
iap10@706 24 allowing us to efficiently partition a single machine to enable
iap10@706 25 multiple independent clients to run their operating systems and
iap10@706 26 applications in an environment providing protection, resource
iap10@706 27 isolation and accounting. The project web page contains further
iap10@706 28 information along with pointers to papers and technical reports:
iap10@706 29 http://www.cl.cam.ac.uk/xeno
iap10@706 30
iap10@706 31 Xen has since grown into a project in its own right, enabling us to
iap10@706 32 investigate interesting research issues regarding the best techniques
iap10@706 33 for virtualizing resources such as the CPU, memory, disk and network.
iap10@706 34 The project has been bolstered by support from Intel Research
iap10@706 35 Cambridge, who are now working closely with us. We've now also
iap10@706 36 received support from Microsoft Research Cambridge to port Windows XP
iap10@706 37 to run on Xen.
iap10@706 38
iap10@706 39 Xen enables multiple operating system images to be run simultaneously
iap10@706 40 on the same hardware with very low performance overhead --- much lower
iap10@706 41 than commercial offerings on the same x86 platform.
iap10@706 42
iap10@706 43 This is achieved by requiring OSs to be specifically ported to run on
iap10@706 44 Xen, rather than allowing unmodified OS images to be used. Crucially,
iap10@706 45 only the OS needs to be changed -- all of the user-level application
iap10@706 46 binaries, libraries etc can run unmodified. Hence, the modified OS
iap10@706 47 kernel can typically just be dropped into any existing OS distribution
iap10@706 48 or installation.
iap10@706 49
iap10@706 50 Xen currently runs on the x86 architecture, but could in principle be
iap10@706 51 ported to other CPUs. In fact, it would have been rather easier to
iap10@706 52 write Xen for pretty much any other architecture as x86 doesn't do us
iap10@706 53 any favours at all. The best description of Xen's deign,
iap10@706 54 implementation and performance is contained in our October 2003 SOSP
iap10@706 55 paper: http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf
iap10@706 56
iap10@706 57 We have been working on porting 3 different operating systems to run
iap10@706 58 on Xen: Linux 2.4, Windows XP, and NetBSD.
iap10@706 59
iap10@706 60 The Linux 2.4 port (currently Linux 2.4.22) works very well -- we
iap10@706 61 regularly use it to host complex applications such as PostgreSQL,
iap10@706 62 Apache, BK servers etc. It runs all applications we've tried. We
iap10@706 63 refer to our version of Linux ported to run on Xen as "XenoLinux",
iap10@706 64 through really it's just standard Linux ported to a new virtual CPU
iap10@706 65 architecture that we call xeno-x86 (abbreviated to just "xeno").
iap10@706 66
iap10@706 67 Unfortunately, the NetBSD port has stalled due to lack of man
iap10@706 68 power. We believe most of the hard stuff has already been done, and
iap10@706 69 are hoping to get the ball rolling again soon. In hindsight, a FreeBSD
iap10@706 70 4 port might have been more useful to the community.
iap10@706 71
iap10@706 72 The Windows XP port is nearly finished. It's running user space
iap10@706 73 applications and is generally in pretty good shape thanks to some hard
iap10@706 74 work by the team over the summer. Of course, there are issues with
iap10@706 75 releasing this code to others. We should be able to release the
iap10@706 76 source and binaries to anyone else that's signed the Microsoft
iap10@706 77 academic source license, which these days has very reasonable
iap10@706 78 terms. We are in discussions with Microsoft about the possibility of
iap10@706 79 being able to make binary releases to a larger user
iap10@706 80 community. Obviously, there are issues with product activation in this
iap10@706 81 environment and such like, which need to be thought through.
iap10@706 82
iap10@706 83 So, for the moment, you only get to run multiple copies of Linux on
iap10@706 84 Xen, but we hope this will change before too long. Even running
iap10@706 85 multiple copies of the same OS can be very useful, as it provides a
iap10@706 86 means of containing faults to one OS image, and also for providing
iap10@706 87 performance isolation between the various OS, enabling you to either
iap10@706 88 restrict, or reserve resources for, particular VM instances.
iap10@706 89
iap10@706 90 Its also useful for development -- each version of Linux can have
iap10@706 91 different patches applied, enabling different kernels to be tried
iap10@706 92 out. For example, the "vservers" patch used by PlanetLab applies
iap10@706 93 cleanly to our ported version of Linux.
iap10@706 94
iap10@706 95 We've successfully booted over 128 copies of Linux on the same machine
iap10@706 96 (a dual CPU hyperthreaded Xeon box) but we imagine that it would be
iap10@706 97 more normal to use some smaller number, perhaps 10-20.
iap10@706 98
iap10@706 99 Known limitations and work in progress
iap10@706 100 ======================================
iap10@706 101
iap10@706 102 The "xenctl" tool is still rather clunky and not very user
iap10@706 103 friendly. In particular, it should have an option to create and start
iap10@706 104 a domain with all the necessary parameters set from a named xml file.
iap10@706 105
iap10@706 106 The java xenctl tool is really just a frontend for a bunch of C tools
iap10@706 107 named xi_* that do the actual work of talking to Xen and setting stuff
iap10@706 108 up. Some local users prefer to drive the xi_ tools directly, typically
iap10@706 109 from simple shell scripts. These tools are even less user friendly
iap10@706 110 than xenctl but its arguably clearer what's going on.
iap10@706 111
iap10@706 112 There's also a web based interface for controlling domains that uses
iap10@706 113 apache/tomcat, but it has fallen out of sync with respect to the
iap10@706 114 underlying tools, so doesn't always work as expected and needs to be
iap10@706 115 fixed.
iap10@706 116
iap10@706 117 The current Virtual Firewall Router (VFR) implementation in the
iap10@706 118 snapshot tree is very rudimentary, and in particular, lacks the IP
iap10@706 119 port-space sharing across domains that we've proposed that promises to
iap10@706 120 provide a better alternative to NAT. There's a complete new
iap10@706 121 implementation under development which also supports much better
iap10@706 122 logging and auditing support. The current network scheduler is just
iap10@706 123 simple round-robin between domains, without any rate limiting or rate
iap10@706 124 guarantees. Dropping in a new scheduler should be straightforward, and
iap10@706 125 is planned as part of the VFRv2 work package.
iap10@706 126
iap10@706 127 Another area that needs further work is the interface between Xen and
iap10@706 128 domain0 user space where the various XenoServer control daemons run.
iap10@706 129 The current interface is somewhat ad-hoc, making use of various
iap10@706 130 /proc/xeno entries that take a random assortment of arguments. We
iap10@706 131 intend to reimplement this to provide a consistent means of feeding
iap10@706 132 back accounting and logging information to the control daemon.
iap10@706 133
iap10@706 134 There's also a number of memory management hacks that didn't make this
iap10@706 135 release: We have plans for a "universal buffer cache" that enables
iap10@706 136 otherwise unused system memory to be used by domains in a read-only
iap10@706 137 fashion. We also have plans for inter-domain shared-memory to enable
iap10@706 138 high-performance bulk transport for cases where the usual internal
iap10@706 139 networking performance isn't good enough (e.g. communication with a
iap10@706 140 internal file server on another domain).
iap10@706 141
iap10@706 142 We also have plans to implement domain suspend/resume-to-file. This is
iap10@706 143 basically an extension to the current domain building process to
iap10@706 144 enable domain0 to read out all of the domain's state and store it in a
iap10@706 145 file. There are complications here due to Xen's para-virtualised
iap10@706 146 design, whereby since the physical machine memory pages available to
iap10@706 147 the guest OS are likely to be different when the OS is resumed, we
iap10@706 148 need to re-write the page tables appropriately.
iap10@706 149
iap10@706 150 We have the equivalent of balloon driver functionality to control
iap10@706 151 domain's memory usage, enabling a domain to give back unused pages to
iap10@706 152 Xen. This needs properly documenting, and perhaps a way of domain0
iap10@706 153 signalling to a domain that it requires it to reduce its memory
iap10@706 154 footprint, rather than just the domain volunteering.
iap10@706 155
iap10@706 156 The current disk scheduler is rather simplistic (batch round robin),
iap10@706 157 and could be replaced by e.g. Cello if we have QoS isolation
iap10@706 158 problems. For most things it seems to work OK, but there's currently
iap10@706 159 no service differentiation or weighting.
iap10@706 160
iap10@706 161 Currently, although Xen runs on SMP and SMT (hyperthreaded) machines,
iap10@706 162 the scheduling is far from smart -- domains are currently statically
iap10@706 163 assigned to a CPU when they are created (in a round robin fashion).
iap10@706 164 The scheduler needs to be modified such that before going idle a
iap10@706 165 logical CPU looks for work on other run queues (particularly on the
iap10@706 166 same physical CPU).
iap10@706 167
iap10@706 168 Xen currently only supports uniprocessor guest OSes. We have designed
iap10@706 169 the Xen interface with MP guests in mind, and plan to build an MP
iap10@706 170 Linux guest in due course. Basically, an MP guest would consist of
iap10@706 171 multiple scheduling domains (one per CPU) sharing a single memory
iap10@706 172 protection domain. The only extra complexity for the Xen VM system is
iap10@706 173 ensuring that when a page transitions from holding a page table or
iap10@706 174 page directory to a write-able page, we must ensure that no other CPU
iap10@706 175 still has the page in its TLB to ensure memory system integrity. One
iap10@706 176 other issue for supporting MP guests is that we'll need some sort of
iap10@706 177 CPU gang scheduler, which will require some research.
iap10@706 178
iap10@706 179
iap10@706 180 Hardware support
iap10@706 181 ================
iap10@706 182
iap10@706 183 Xen is intended to be run on server-class machines, and the current
iap10@706 184 list of supported hardware very much reflects this, avoiding the need
iap10@706 185 for us to write drivers for "legacy" hardware.
iap10@706 186
iap10@706 187 Xen requires a "P6" or newer processor (e.g. Pentium Pro, Celeron,
iap10@706 188 Pentium II, Pentium III, Pentium IV, Xeon, AMD Athlon, AMD Duron).
iap10@706 189 Multiprocessor machines are supported, and we also have basic support
iap10@706 190 for HyperThreading (SMT), though this remains a topic for ongoing
iap10@706 191 research. We're also looking at an AMD x86_64 port (though it should
iap10@706 192 run on Opterons in 32 bit mode just fine).
iap10@706 193
iap10@706 194 Xen can currently use up to 4GB of memory. Its possible for x86
iap10@706 195 machines to address more than that (64GB), but it requires using a
iap10@706 196 different page table format (3-level rather than 2-level) that we
iap10@706 197 currently don't support. Adding 3-level PAE support wouldn't be
iap10@706 198 difficult, but we'd also need to add support to all the guest
iap10@706 199 OSs. Volunteers welcome!
iap10@706 200
iap10@706 201 We currently support a relative modern set of network cards: Intel
iap10@706 202 e1000, Broadcom BCM 57xx (tg3), 3COM 3c905 (3c59x). Adding support for
iap10@706 203 other NICs that support hardware DMA scatter/gather from half-word
iap10@706 204 aligned addresses is relatively straight forward, by porting the
iap10@706 205 equivalent Linux driver. Drivers for a number of other older cards
iap10@706 206 have recently been added [pcnet32, e100, tulip], but are untested and
iap10@706 207 not recommended.
iap10@706 208
iap10@706 209
iap10@706 210 Ian Pratt
iap10@706 211 9 Sep 2003