ia64/xen-unstable

changeset 327:cdaace96648d

bitkeeper revision 1.144 (3e74d2c4Sp8uQ7JRHj5sf5cI7xOMQA)

Merge scramble.cl.cam.ac.uk:/usr/groups/xeno/BK/xeno.bk
into scramble.cl.cam.ac.uk:/local/scratch/kaf24/xeno
author kaf24@scramble.cl.cam.ac.uk
date Sun Mar 16 19:38:44 2003 +0000 (2003-03-16)
parents 6cb457415689 a0c481468997
children 47f6cdee5a9b
files .rootkeys xen/README xen/TODO
line diff
     1.1 --- a/.rootkeys	Sun Mar 16 17:45:47 2003 +0000
     1.2 +++ b/.rootkeys	Sun Mar 16 19:38:44 2003 +0000
     1.3 @@ -193,6 +193,7 @@ 3e4d0046IBzDIeaMbQB-e2QB2ahbig tools/dom
     1.4  3ddb79bcbOVHh38VJzc97-JEGD4dJQ xen/Makefile
     1.5  3ddb79bcCa2VbsMp7mWKlhgwLQUQGA xen/README
     1.6  3ddb79bcWnTwYsQRWl_PaneJfa6p0w xen/Rules.mk
     1.7 +3e74d2be6ELqhaY1sW0yyHRKhpOvDQ xen/TODO
     1.8  3ddb79bcZbRBzT3elFWSX7u6NtMagQ xen/arch/i386/Makefile
     1.9  3ddb79bcBQF85CfLS4i1WGZ4oLLaCA xen/arch/i386/Rules.mk
    1.10  3e5636e5FAYZ5_vQnmgwFJfSdmO5Mw xen/arch/i386/acpitable.c
     2.1 --- a/xen/README	Sun Mar 16 17:45:47 2003 +0000
     2.2 +++ b/xen/README	Sun Mar 16 19:38:44 2003 +0000
     2.3 @@ -1,110 +1,17 @@
     2.4 -
     2.5 -*****************************************************
     2.6 -   Xeno Hypervisor (18/7/02)
     2.7 -
     2.8 -1) Tree layout
     2.9 -Looks rather like a simplified Linux :-)
    2.10 -Headers are in include/xeno and include asm-<arch>.
    2.11 -At build time we create symlinks:
    2.12 - include/linux -> include/xeno
    2.13 - include/asm   -> include/asm-<arch>
    2.14 -In this way, Linux device drivers should need less tweaking of
    2.15 -their #include lines.
    2.16 -
    2.17 -For source files, mapping between hypervisor and Linux is:
    2.18 - Linux                 Hypervisor
    2.19 - -----                 ----------
    2.20 - kernel/init/mm/lib -> common
    2.21 - net/*              -> net/*
    2.22 - drivers/*          -> drivers/*
    2.23 - arch/*             -> arch/*
    2.24 -
    2.25 -Note that the use of #include <asm/...> and #include <linux/...> can
    2.26 -lead to confusion, as such files will often exist on the system include
    2.27 -path, even if a version doesn't exist within the hypervisor tree.
    2.28 -Unfortunately '-nostdinc' cannot be specified to the compiler, as that
    2.29 -prevents us using stdarg.h in the compiler's own header directory.
    2.30 -
    2.31 -We try to not modify things in driver/* as much as possible, so we can
    2.32 -easily take updates from Linux. arch/* is basically straight from
    2.33 -Linux, with fingers in Linux-specific pies hacked off. common/* has
    2.34 -a lot of Linux code in it, but certain subsystems (task maintenance,
    2.35 -low-level memory handling) have been replaced. net/* contains enough
    2.36 -Linux-like gloop to get network drivers to work with little/no
    2.37 -modification.
    2.38 -
    2.39 -2) Building
    2.40 -'make': Builds ELF executable called 'image' in base directory
    2.41 -'make install': gzip-compresses 'image' and copies it to TFTP server
    2.42 -'make clean': removes *all* build and target files
    2.43 -
    2.44  
    2.45  *****************************************************
    2.46 -Random thoughts and stuff from here down...
    2.47 -
    2.48 -Todo list
    2.49 ----------
    2.50 -* Hypervisor need only directly map its own memory pool
    2.51 -  (maybe 128MB, tops). That would need 0x08000000....
    2.52 -  This would allow 512MB Linux with plenty room for vmalloc'ed areas.
    2.53 -* Network device -- port drivers to hypervisor, implement virtual
    2.54 -  driver for xeno-linux. Looks like Ethernet.
    2.55 -  -- Hypervisor needs to do (at a minimum):
    2.56 -       - packet filtering on tx (unicast IP only)
    2.57 -       - packet demux on rx     (unicast IP only)
    2.58 -       - provide DHCP [maybedo something simpler?]
    2.59 -         and ARP [at least for hypervisor IP address]
    2.60 -
    2.61 -
    2.62 -Segment descriptor tables
    2.63 --------------------------
    2.64 -We want to allow guest OSes to specify GDT and LDT tables using their
    2.65 -own pages of memory (just like with page tables). So allow the following:
    2.66 - * new_table_entry(ptr, val)
    2.67 -   [Allows insertion of a code, data, or LDT descriptor into given
    2.68 -    location. Can simply be checked then poked, with no need to look at
    2.69 -    page type.]
    2.70 - * new_GDT() -- relevent virtual pages are resolved to frames. Either
    2.71 -    (i) page not present; or (ii) page is only mapped read-only and checks
    2.72 -    out okay (then marked as special page). Old table is resolved first,
    2.73 -    and the pages are unmarked (no longer special type).
    2.74 - * new_LDT() -- same as for new_GDT(), with same special page type.
    2.75 +   Xeno Hypervisor (16/3/03)
    2.76  
    2.77 -Page table updates must be hooked, so we look for updates to virtual page
    2.78 -addresses in the GDT/LDT range. If map to not present, then old physpage
    2.79 -has type_count decremented. If map to present, ensure read-only, check the
    2.80 -page, and set special type.
    2.81 -
    2.82 -Merge set_{LDT,GDT} into update_baseptr, by passing four args:
    2.83 - update_baseptrs(mask, ptab, gdttab, ldttab);
    2.84 -Update of ptab requires update of gtab (or set to internal default).
    2.85 -Update of gtab requires update of ltab (or set to internal default).
    2.86 -
    2.87 +'make': Builds ELF executable called 'image' in base directory
    2.88 +'make clean': removes *all* build and target files
    2.89  
    2.90 -The hypervisor page cache
    2.91 --------------------------
    2.92 -This will allow guest OSes to make use of spare pages in the system, but
    2.93 -allow them to be immediately used for any new domains or memory requests.
    2.94 -The idea is that, when a page is laundered and falls off Linux's clean_LRU
    2.95 -list, rather than freeing it it becomes a candidate for passing down into
    2.96 -the hypervisor. In return, xeno-linux may ask for one of its previously-
    2.97 -cached pages back:
    2.98 - (page, new_id) = cache_query(page, old_id);
    2.99 -If the requested page couldn't be kept, a blank page is returned.
   2.100 -When would Linux make the query? Whenever it wants a page back without
   2.101 -the delay or going to disc. Also, whenever a page would otherwise be
   2.102 -flushed to disc.
   2.103 -
   2.104 -To try and add to the cache: (blank_page, new_id) = cache_query(page, NULL);
   2.105 - [NULL means "give me a blank page"].
   2.106 -To try and retrieve from the cache: (page, new_id) = cache_query(x_page, id)
   2.107 - [we may request that x_page just be discarded, and therefore not impinge
   2.108 -  on this domain's cache quota].
   2.109  
   2.110  
   2.111  Booting secondary processors
   2.112  ----------------------------
   2.113  
   2.114 +It's twisty and turny, so this is (roughly) the code path:
   2.115 +
   2.116  start_of_day (i386/setup.c)
   2.117  smp_boot_cpus (i386/smpboot.c)
   2.118   * initialises boot CPU data
   2.119 @@ -128,18 +35,3 @@ On other processor:
   2.120         * barrier, then write bitmasks to signal back to boot cpu
   2.121         * then barrel into...
   2.122           cpu_idle (i386/process.c)
   2.123 -         [THIS IS PROBABLY REASONABLE -- BOOT CPU SHOULD KICK
   2.124 -          SECONDARIES TO GET WORK DONE]
   2.125 -
   2.126 -
   2.127 -SMP capabilities
   2.128 -----------------
   2.129 -
   2.130 -Current intention is to allow hypervisor to schedule on all processors in
   2.131 -SMP boxen, but to tie each domain to a single processor. This simplifies
   2.132 -many SMP intricacies both in terms of correctness and efficiency (eg.
   2.133 -TLB flushing, network packet delivery, ...).
   2.134 -
   2.135 -Clients can still make use of SMP by installing multiple domains on a single
   2.136 -machine, and treating it as a fast cluster (at the very least, the
   2.137 -hypervisor will have fast routing of locally-destined packets).
     3.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     3.2 +++ b/xen/TODO	Sun Mar 16 19:38:44 2003 +0000
     3.3 @@ -0,0 +1,129 @@
     3.4 +
     3.5 +This is stuff we probably want to implement in the near future. I
     3.6 +think I have them in a sensible priority order -- the first few would
     3.7 +be nice to fix before a code release. The later ones can be
     3.8 +longer-term goals.
     3.9 +
    3.10 + -- Keir (16/3/03)
    3.11 +
    3.12 +
    3.13 +1. ASSIGNING DOMAINS TO PROCESSORS
    3.14 +----------------------------------
    3.15 +More intelligent assignment of domains to processors. In
    3.16 +particular, we don't play well with hyperthreading: we will assign
    3.17 +domains to virtual processors on the same package, rather then
    3.18 +spreading them across processor packages.
    3.19 +
    3.20 +What we need to do is port code from Linux which stores information on
    3.21 +relationships between processors in the system (eg. which ones are
    3.22 +siblings in teh same package). We then use this to balance domains
    3.23 +across packages, and across virtual processors within a package.
    3.24 +
    3.25 +2. PROPER DESTRUCTION OF DOMAINS
    3.26 +--------------------------------
    3.27 +Currently we do not free resources when destroying a domain. This is
    3.28 +because they may be tied up in subsystems, and there is no way of
    3.29 +pulling them back in a safe manner.
    3.30 +
    3.31 +The fix is probably to reference count resources and automatically
    3.32 +free them when the count reaches zero. We may get away with one count
    3.33 +per domain (for all its resources). When this reaches zero we know it
    3.34 +is safe to free everything: block-device rings, network rings, and all
    3.35 +the rest.
    3.36 +
    3.37 +3. FIX HANDLING OF NETWORK RINGS
    3.38 +--------------------------------
    3.39 +Handling of the transmit rings is currently very broken (for example,
    3.40 +sending an inter-domain packet will wedge the hypervisor). This is
    3.41 +because we may handle packets out of order (eg. inter-domain packets
    3.42 +are handled eagerly, while packets for real interfaces are queued),
    3.43 +but our current ring design really assumes in-order handling.
    3.44 +
    3.45 +A neat fix will be to allow responses to be queued in a different
    3.46 +order to requests, just as we already do with block-device
    3.47 +rings. We'll need to add an opaque identifier to ring entries,
    3.48 +allowing matching of requests and responses, but that's about it.
    3.49 +
    3.50 +4. GDT AND LDT VIRTUALISATION 
    3.51 +----------------------------- 
    3.52 +We do not allow modification of the GDT, or any use of the LDT. This
    3.53 +is necessary for support of unmodified applications (eg. Linux uses
    3.54 +LDT in threaded applications, while Windows needs to update GDT
    3.55 +entries).
    3.56 +
    3.57 +I have some text on how to do this:
    3.58 +/usr/groups/xeno/discussion-docs/memory_management/segment_tables.txt
    3.59 +It's already half implemented, but the rest is still to do.
    3.60 +
    3.61 +5. DOMAIN 0 MANAGEMENT DAEMON
    3.62 +-----------------------------
    3.63 +A better control daemon is required for domain 0, which keeps proper
    3.64 +track of machine resources and can make sensible policy choices. This
    3.65 +may require support in Xen; for example, notifications (eg. DOMn is
    3.66 +killed), and requests (eg. can DOMn allocate x frames of memory?).
    3.67 +
    3.68 +6. ACCURATE TIMERS AND WALL-CLOCK TIME
    3.69 +--------------------------------------
    3.70 +Currently our long-term timebase free runs on CPU0, with no external
    3.71 +calibration. We should run ntpd on domain 0 and allow this to warp
    3.72 +Xen's timebase. Once this is done, we can have a timebase per CPU and
    3.73 +not worry about relative drift (since they'll all get sync'ed
    3.74 +periodically by ntp).
    3.75 +
    3.76 +7. NEW DESIGN FEATURES
    3.77 +----------------------
    3.78 +This includes the last-chance page cache, and the unified buffer cache.
    3.79 +
    3.80 +
    3.81 +
    3.82 +Graveyard
    3.83 +*********
    3.84 +
    3.85 +Following is some description how some of the above might be
    3.86 +implemented. Some of it is superceded and/or out of date, so follow
    3.87 +with caution.
    3.88 +
    3.89 +Segment descriptor tables
    3.90 +-------------------------
    3.91 +We want to allow guest OSes to specify GDT and LDT tables using their
    3.92 +own pages of memory (just like with page tables). So allow the following:
    3.93 + * new_table_entry(ptr, val)
    3.94 +   [Allows insertion of a code, data, or LDT descriptor into given
    3.95 +    location. Can simply be checked then poked, with no need to look at
    3.96 +    page type.]
    3.97 + * new_GDT() -- relevent virtual pages are resolved to frames. Either
    3.98 +    (i) page not present; or (ii) page is only mapped read-only and checks
    3.99 +    out okay (then marked as special page). Old table is resolved first,
   3.100 +    and the pages are unmarked (no longer special type).
   3.101 + * new_LDT() -- same as for new_GDT(), with same special page type.
   3.102 +
   3.103 +Page table updates must be hooked, so we look for updates to virtual page
   3.104 +addresses in the GDT/LDT range. If map to not present, then old physpage
   3.105 +has type_count decremented. If map to present, ensure read-only, check the
   3.106 +page, and set special type.
   3.107 +
   3.108 +Merge set_{LDT,GDT} into update_baseptr, by passing four args:
   3.109 + update_baseptrs(mask, ptab, gdttab, ldttab);
   3.110 +Update of ptab requires update of gtab (or set to internal default).
   3.111 +Update of gtab requires update of ltab (or set to internal default).
   3.112 +
   3.113 +
   3.114 +The hypervisor page cache
   3.115 +-------------------------
   3.116 +This will allow guest OSes to make use of spare pages in the system, but
   3.117 +allow them to be immediately used for any new domains or memory requests.
   3.118 +The idea is that, when a page is laundered and falls off Linux's clean_LRU
   3.119 +list, rather than freeing it it becomes a candidate for passing down into
   3.120 +the hypervisor. In return, xeno-linux may ask for one of its previously-
   3.121 +cached pages back:
   3.122 + (page, new_id) = cache_query(page, old_id);
   3.123 +If the requested page couldn't be kept, a blank page is returned.
   3.124 +When would Linux make the query? Whenever it wants a page back without
   3.125 +the delay or going to disc. Also, whenever a page would otherwise be
   3.126 +flushed to disc.
   3.127 +
   3.128 +To try and add to the cache: (blank_page, new_id) = cache_query(page, NULL);
   3.129 + [NULL means "give me a blank page"].
   3.130 +To try and retrieve from the cache: (page, new_id) = cache_query(x_page, id)
   3.131 + [we may request that x_page just be discarded, and therefore not impinge
   3.132 +  on this domain's cache quota].