ia64/xen-unstable

changeset 17597:611787b6ca35

merge with xen-unstable.hg
author Isaku Yamahata <yamahata@valinux.co.jp>
date Thu May 08 18:40:07 2008 +0900 (2008-05-08)
parents f2457c7aff8d 9a6ad687ec20
children af327038a43f
files xen/drivers/passthrough/pci_regs.h xen/drivers/passthrough/vtd/msi.h xen/include/asm-ia64/config.h
line diff
     1.1 --- a/README	Fri Apr 25 20:13:52 2008 +0900
     1.2 +++ b/README	Thu May 08 18:40:07 2008 +0900
     1.3 @@ -1,10 +1,10 @@
     1.4  #################################
     1.5 - __  __            _____  ____  
     1.6 - \ \/ /___ _ __   |___ / |___ \ 
     1.7 -  \  // _ \ '_ \    |_ \   __) |
     1.8 -  /  \  __/ | | |  ___) | / __/ 
     1.9 - /_/\_\___|_| |_| |____(_)_____|
    1.10 -                                
    1.11 + __  __            _____  _____  
    1.12 + \ \/ /___ _ __   |___ / |___ /  
    1.13 +  \  // _ \ '_ \    |_ \   |_ \  
    1.14 +  /  \  __/ | | |  ___) | ___) | 
    1.15 + /_/\_\___|_| |_| |____(_)____/  
    1.16 +                                 
    1.17  #################################
    1.18  
    1.19  http://www.xen.org/
    1.20 @@ -21,11 +21,10 @@ development community, spearheaded by Xe
    1.21  by the original Xen development team to build enterprise products
    1.22  around Xen.
    1.23  
    1.24 -The 3.2 release offers excellent performance, hardware support and
    1.25 +The 3.3 release offers excellent performance, hardware support and
    1.26  enterprise-grade features such as x86_32-PAE, x86_64, SMP guests and
    1.27 -live relocation of VMs. This install tree contains source for a Linux
    1.28 -2.6 guest; ports to Linux 2.4, NetBSD, FreeBSD and Solaris are
    1.29 -available from the community.
    1.30 +live relocation of VMs. Ports to Linux 2.6, Linux 2.4, NetBSD, FreeBSD
    1.31 +and Solaris are available from the community.
    1.32  
    1.33  This file contains some quick-start instructions to install Xen on
    1.34  your system. For full documentation, see the Xen User Manual. If this
    1.35 @@ -55,8 +54,8 @@ 2. Configure your bootloader to boot Xen
    1.36     /boot/grub/menu.lst: edit this file to include an entry like the
    1.37     following:
    1.38  
    1.39 -    title Xen 3.2 / XenLinux 2.6
    1.40 -       kernel /boot/xen-3.2.gz console=vga
    1.41 +    title Xen 3.3 / XenLinux 2.6
    1.42 +       kernel /boot/xen-3.3.gz console=vga
    1.43         module /boot/vmlinuz-2.6-xen root=<root-dev> ro console=tty0
    1.44         module /boot/initrd-2.6-xen.img
    1.45  
    1.46 @@ -75,7 +74,7 @@ 2. Configure your bootloader to boot Xen
    1.47     32MB memory for internal use, which is not available for allocation
    1.48     to virtual machines.
    1.49  
    1.50 -3. Reboot your system and select the "Xen 3.2 / XenLinux 2.6" menu
    1.51 +3. Reboot your system and select the "Xen 3.3 / XenLinux 2.6" menu
    1.52     option. After booting Xen, Linux will start and your initialisation
    1.53     scripts should execute in the usual way.
    1.54  
     2.1 --- a/docs/ChangeLog	Fri Apr 25 20:13:52 2008 +0900
     2.2 +++ b/docs/ChangeLog	Thu May 08 18:40:07 2008 +0900
     2.3 @@ -16,6 +16,15 @@ http://lists.xensource.com/archives/html
     2.4  Xen 3.3 release
     2.5  ---------------
     2.6  
     2.7 +17538: Add XENPF_set_processor_pminfo
     2.8 +http://xenbits.xensource.com/xen-unstable.hg?rev/5bb9093eb0e9
     2.9 +
    2.10 +17537: Add MSI support
    2.11 +http://xenbits.xensource.com/xen-unstable.hg?rev/ad55c06c9bbc
    2.12 +
    2.13 +17524: Add DOMCTL_set_cpuid to configure guest CPUID on x86 systems.
    2.14 +http://xenbits.xensource.com/xen-unstable.hg?rev/18727843db60
    2.15 +
    2.16  17336: Add platform capabilities field to XEN_SYSCTL_physinfo
    2.17  http://xenbits.xensource.com/xen-unstable.hg?rev/250606290439
    2.18  
     3.1 --- a/docs/misc/vtd.txt	Fri Apr 25 20:13:52 2008 +0900
     3.2 +++ b/docs/misc/vtd.txt	Thu May 08 18:40:07 2008 +0900
     3.3 @@ -2,7 +2,7 @@ Title   : How to do PCI Passthrough with
     3.4  Authors : Allen Kay    <allen.m.kay@intel.com>
     3.5            Weidong Han  <weidong.han@intel.com>
     3.6  Created : October-24-2007
     3.7 -Updated : December-13-2007
     3.8 +Updated : May-07-2008
     3.9  
    3.10  How to turn on VT-d in Xen
    3.11  --------------------------
    3.12 @@ -22,7 +22,7 @@ 11) "hide" pci device from dom0 as follo
    3.13  title Xen-Fedora Core (2.6.18-xen)
    3.14          root (hd0,0)
    3.15          kernel /boot/xen.gz com1=115200,8n1 console=com1
    3.16 -        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro console=tty0 console=ttyS0,115200,8n1 pciback.hide=(01:00.0)(03:00.0) pciback.verbose_request=1 apic=debug
    3.17 +        module /boot/vmlinuz-2.6.18.8-xen root=LABEL=/ ro xencons=ttyS console=tty0 console=ttyS0, pciback.hide=(01:00.0)(03:00.0)
    3.18          module /boot/initrd-2.6.18-xen.img
    3.19  
    3.20  12) reboot system
    3.21 @@ -47,8 +47,6 @@ VT-d Works on OS:
    3.22  1) Host OS: PAE, 64-bit
    3.23  2) Guest OS: 32-bit, PAE, 64-bit
    3.24  
    3.25 -Because current Xen doesn't support MSI, for guest OS which uses MSI by default, need to add "pci=nomsi" option on its grub, e.g. RHEL5, FC6.
    3.26 -
    3.27  
    3.28  Combinations Tested:
    3.29  --------------------
    3.30 @@ -57,6 +55,33 @@ 1) 64-bit host: 32/PAE/64 Linux/XP/Win20
    3.31  2) PAE host: 32/PAE Linux/XP/Win2003/Vista guests
    3.32  
    3.33  
    3.34 +VTd device hotplug:
    3.35 +-------------------
    3.36 + 
    3.37 +2 virtual PCI slots (6~7) are reserved in HVM guest to support VTd hotplug. If you have more VTd devices, only 2 of them can support hotplug. Usage is simple:
    3.38 +
    3.39 + 1. List the VTd device by dom. You can see a VTd device 0:2:0.0 is inserted in the HVM domain's PCI slot 6. '''lspci''' inside the guest should see the same.
    3.40 +
    3.41 +	[root@vt-vtd ~]# xm pci-list HVMDomainVtd
    3.42 +	VSlt domain   bus   slot   func
    3.43 +	0x6    0x0  0x02   0x00    0x0
    3.44 +
    3.45 + 2. Detach the device from the guest by the physical BDF. Then HVM guest will receive a virtual PCI hot removal event to detach the physical device
    3.46 +
    3.47 +	[root@vt-vtd ~]# xm pci-detach HVMDomainVtd 0:2:0.0
    3.48 +
    3.49 + 3. Attach a PCI device to the guest by the physical BDF and desired virtual slot(optional). Following command would insert the physical device into guest's virtual slot 7
    3.50 +
    3.51 +	[root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7
    3.52 +
    3.53 +VTd hotplug usage model:
    3.54 +------------------------
    3.55 +
    3.56 + * For live migration: As you know, VTd device would break the live migration as physical device can't be save/restored like virtual device. With hotplug, live migration is back again. Just hot remove all the VTd devices before live migration and hot add new VTd devices on target machine after live migration.
    3.57 +
    3.58 + * VTd hotplug for device switch: VTd hotplug can be used to dynamically switch physical device between different HVM guest without shutdown.
    3.59 +
    3.60 +
    3.61  VT-d Enabled Systems
    3.62  --------------------
    3.63  
    3.64 @@ -74,3 +99,5 @@ http://www.dell.com/content/products/cat
    3.65  - HP Compaq:  DC7800
    3.66  http://h10010.www1.hp.com/wwpc/us/en/en/WF04a/12454-12454-64287-321860-3328898.html
    3.67  
    3.68 +For more information, pls refer to http://wiki.xensource.com/xenwiki/VTdHowTo.
    3.69 +
     4.1 --- a/docs/src/user.tex	Fri Apr 25 20:13:52 2008 +0900
     4.2 +++ b/docs/src/user.tex	Thu May 08 18:40:07 2008 +0900
     4.3 @@ -2459,9 +2459,7 @@ Unmanaged domains are started in Xen by 
     4.4  file. Please refer to Section~\ref{subsection:acmlabelmanageddomains}
     4.5  if you are using managed domains.
     4.6  
     4.7 -The following configuration file defines \verb|domain1|
     4.8 -(Note: www.jailtime.org or www.xen-get.org might be good
     4.9 -places to look for example domU images):
    4.10 +The following configuration file defines \verb|domain1|:
    4.11  
    4.12  \begin{scriptsize}
    4.13  \begin{verbatim}
     5.1 --- a/extras/mini-os/blkfront.c	Fri Apr 25 20:13:52 2008 +0900
     5.2 +++ b/extras/mini-os/blkfront.c	Thu May 08 18:40:07 2008 +0900
     5.3 @@ -50,6 +50,8 @@ struct blkfront_dev {
     5.4      char *backend;
     5.5      struct blkfront_info info;
     5.6  
     5.7 +    xenbus_event_queue events;
     5.8 +
     5.9  #ifdef HAVE_LIBC
    5.10      int fd;
    5.11  #endif
    5.12 @@ -101,6 +103,8 @@ struct blkfront_dev *init_blkfront(char 
    5.13  
    5.14      dev->ring_ref = gnttab_grant_access(dev->dom,virt_to_mfn(s),0);
    5.15  
    5.16 +    dev->events = NULL;
    5.17 +
    5.18      // FIXME: proper frees on failures
    5.19  again:
    5.20      err = xenbus_transaction_start(&xbt);
    5.21 @@ -166,11 +170,9 @@ done:
    5.22  
    5.23          snprintf(path, sizeof(path), "%s/state", dev->backend);
    5.24  
    5.25 -        xenbus_watch_path(XBT_NIL, path);
    5.26 +        xenbus_watch_path_token(XBT_NIL, path, path, &dev->events);
    5.27  
    5.28 -        xenbus_wait_for_value(path,"4");
    5.29 -
    5.30 -        xenbus_unwatch_path(XBT_NIL, path);
    5.31 +        xenbus_wait_for_value(path, "4", &dev->events);
    5.32  
    5.33          snprintf(path, sizeof(path), "%s/info", dev->backend);
    5.34          dev->info.info = xenbus_read_integer(path);
    5.35 @@ -211,10 +213,12 @@ void shutdown_blkfront(struct blkfront_d
    5.36  
    5.37      snprintf(path, sizeof(path), "%s/state", dev->backend);
    5.38      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 5); /* closing */
    5.39 -    xenbus_wait_for_value(path,"5");
    5.40 +    xenbus_wait_for_value(path, "5", &dev->events);
    5.41  
    5.42      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 6);
    5.43 -    xenbus_wait_for_value(path,"6");
    5.44 +    xenbus_wait_for_value(path, "6", &dev->events);
    5.45 +
    5.46 +    xenbus_unwatch_path(XBT_NIL, path);
    5.47  
    5.48      unbind_evtchn(dev->evtchn);
    5.49  
     6.1 --- a/extras/mini-os/fbfront.c	Fri Apr 25 20:13:52 2008 +0900
     6.2 +++ b/extras/mini-os/fbfront.c	Thu May 08 18:40:07 2008 +0900
     6.3 @@ -31,6 +31,8 @@ struct kbdfront_dev {
     6.4      char *nodename;
     6.5      char *backend;
     6.6  
     6.7 +    xenbus_event_queue events;
     6.8 +
     6.9  #ifdef HAVE_LIBC
    6.10      int fd;
    6.11  #endif
    6.12 @@ -75,6 +77,8 @@ struct kbdfront_dev *init_kbdfront(char 
    6.13      dev->page = s = (struct xenkbd_page*) alloc_page();
    6.14      memset(s,0,PAGE_SIZE);
    6.15  
    6.16 +    dev->events = NULL;
    6.17 +
    6.18      s->in_cons = s->in_prod = 0;
    6.19      s->out_cons = s->out_prod = 0;
    6.20  
    6.21 @@ -136,11 +140,9 @@ done:
    6.22  
    6.23          snprintf(path, sizeof(path), "%s/state", dev->backend);
    6.24  
    6.25 -        xenbus_watch_path(XBT_NIL, path);
    6.26 +        xenbus_watch_path_token(XBT_NIL, path, path, &dev->events);
    6.27  
    6.28 -        xenbus_wait_for_value(path,"4");
    6.29 -
    6.30 -        xenbus_unwatch_path(XBT_NIL, path);
    6.31 +        xenbus_wait_for_value(path, "4", &dev->events);
    6.32  
    6.33          printk("%s connected\n", dev->backend);
    6.34  
    6.35 @@ -199,10 +201,12 @@ void shutdown_kbdfront(struct kbdfront_d
    6.36  
    6.37      snprintf(path, sizeof(path), "%s/state", dev->backend);
    6.38      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 5); /* closing */
    6.39 -    xenbus_wait_for_value(path,"5");
    6.40 +    xenbus_wait_for_value(path, "5", &dev->events);
    6.41  
    6.42      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 6);
    6.43 -    xenbus_wait_for_value(path,"6");
    6.44 +    xenbus_wait_for_value(path, "6", &dev->events);
    6.45 +
    6.46 +    xenbus_unwatch_path(XBT_NIL, path);
    6.47  
    6.48      unbind_evtchn(dev->evtchn);
    6.49  
    6.50 @@ -249,6 +253,8 @@ struct fbfront_dev {
    6.51      int stride;
    6.52      int mem_length;
    6.53      int offset;
    6.54 +
    6.55 +    xenbus_event_queue events;
    6.56  };
    6.57  
    6.58  void fbfront_handler(evtchn_port_t port, struct pt_regs *regs, void *data)
    6.59 @@ -292,6 +298,7 @@ struct fbfront_dev *init_fbfront(char *n
    6.60      dev->stride = s->line_length = stride;
    6.61      dev->mem_length = s->mem_length = n * PAGE_SIZE;
    6.62      dev->offset = 0;
    6.63 +    dev->events = NULL;
    6.64  
    6.65      const int max_pd = sizeof(s->pd) / sizeof(s->pd[0]);
    6.66      unsigned long mapped = 0;
    6.67 @@ -368,14 +375,12 @@ done:
    6.68  
    6.69          snprintf(path, sizeof(path), "%s/state", dev->backend);
    6.70  
    6.71 -        xenbus_watch_path(XBT_NIL, path);
    6.72 +        xenbus_watch_path_token(XBT_NIL, path, path, &dev->events);
    6.73  
    6.74 -        xenbus_wait_for_value(path,"4");
    6.75 +        xenbus_wait_for_value(path, "4", &dev->events);
    6.76  
    6.77          printk("%s connected\n", dev->backend);
    6.78  
    6.79 -        xenbus_unwatch_path(XBT_NIL, path);
    6.80 -
    6.81          snprintf(path, sizeof(path), "%s/request-update", dev->backend);
    6.82          dev->request_update = xenbus_read_integer(path);
    6.83  
    6.84 @@ -463,10 +468,12 @@ void shutdown_fbfront(struct fbfront_dev
    6.85  
    6.86      snprintf(path, sizeof(path), "%s/state", dev->backend);
    6.87      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 5); /* closing */
    6.88 -    xenbus_wait_for_value(path,"5");
    6.89 +    xenbus_wait_for_value(path, "5", &dev->events);
    6.90  
    6.91      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 6);
    6.92 -    xenbus_wait_for_value(path,"6");
    6.93 +    xenbus_wait_for_value(path, "6", &dev->events);
    6.94 +
    6.95 +    xenbus_unwatch_path(XBT_NIL, path);
    6.96  
    6.97      unbind_evtchn(dev->evtchn);
    6.98  
     7.1 --- a/extras/mini-os/fs-front.c	Fri Apr 25 20:13:52 2008 +0900
     7.2 +++ b/extras/mini-os/fs-front.c	Thu May 08 18:40:07 2008 +0900
     7.3 @@ -917,6 +917,7 @@ static int init_fs_import(struct fs_impo
     7.4      struct fsif_sring *sring;
     7.5      int retry = 0;
     7.6      domid_t self_id;
     7.7 +    xenbus_event_queue events = NULL;
     7.8  
     7.9      printk("Initialising FS fortend to backend dom %d\n", import->dom_id);
    7.10      /* Allocate page for the shared ring */
    7.11 @@ -1026,8 +1027,9 @@ done:
    7.12      sprintf(r_nodename, "%s/state", import->backend);
    7.13      sprintf(token, "fs-front-%d", import->import_id);
    7.14      /* The token will not be unique if multiple imports are inited */
    7.15 -    xenbus_watch_path(XBT_NIL, r_nodename/*, token*/);
    7.16 -    xenbus_wait_for_value(/*token,*/ r_nodename, STATE_READY);
    7.17 +    xenbus_watch_path_token(XBT_NIL, r_nodename, r_nodename, &events);
    7.18 +    xenbus_wait_for_value(r_nodename, STATE_READY, &events);
    7.19 +    xenbus_unwatch_path(XBT_NIL, r_nodename);
    7.20      printk("Backend ready.\n");
    7.21     
    7.22      //create_thread("fs-tester", test_fs_import, import); 
     8.1 --- a/extras/mini-os/include/lib.h	Fri Apr 25 20:13:52 2008 +0900
     8.2 +++ b/extras/mini-os/include/lib.h	Thu May 08 18:40:07 2008 +0900
     8.3 @@ -162,7 +162,7 @@ extern struct file {
     8.4               * wakes select for this FD. */
     8.5              struct {
     8.6                  evtchn_port_t port;
     8.7 -                volatile unsigned long pending;
     8.8 +                unsigned long pending;
     8.9                  int bound;
    8.10              } ports[MAX_EVTCHN_PORTS];
    8.11  	} evtchn;
    8.12 @@ -178,10 +178,10 @@ extern struct file {
    8.13          struct {
    8.14              /* To each xenbus FD is associated a queue of watch events for this
    8.15               * FD.  */
    8.16 -            struct xenbus_event *volatile events;
    8.17 +            xenbus_event_queue events;
    8.18          } xenbus;
    8.19      };
    8.20 -    volatile int read;	/* maybe available for read */
    8.21 +    int read;	/* maybe available for read */
    8.22  } files[];
    8.23  
    8.24  int alloc_fd(enum fd_type type);
     9.1 --- a/extras/mini-os/include/xenbus.h	Fri Apr 25 20:13:52 2008 +0900
     9.2 +++ b/extras/mini-os/include/xenbus.h	Thu May 08 18:40:07 2008 +0900
     9.3 @@ -19,17 +19,18 @@ struct xenbus_event {
     9.4      char *token;
     9.5      struct xenbus_event *next;
     9.6  };
     9.7 +typedef struct xenbus_event *xenbus_event_queue;
     9.8  
     9.9 -char *xenbus_watch_path_token(xenbus_transaction_t xbt, const char *path, const char *token, struct xenbus_event *volatile *events);
    9.10 +char *xenbus_watch_path_token(xenbus_transaction_t xbt, const char *path, const char *token, xenbus_event_queue *events);
    9.11  char *xenbus_unwatch_path_token(xenbus_transaction_t xbt, const char *path, const char *token);
    9.12  extern struct wait_queue_head xenbus_watch_queue;
    9.13 -void xenbus_wait_for_watch(void);
    9.14 -char **xenbus_wait_for_watch_return(void);
    9.15 -char* xenbus_wait_for_value(const char *path, const char *value);
    9.16 +void xenbus_wait_for_watch(xenbus_event_queue *queue);
    9.17 +char **xenbus_wait_for_watch_return(xenbus_event_queue *queue);
    9.18 +char* xenbus_wait_for_value(const char *path, const char *value, xenbus_event_queue *queue);
    9.19  
    9.20  /* When no token is provided, use a global queue. */
    9.21  #define XENBUS_WATCH_PATH_TOKEN "xenbus_watch_path"
    9.22 -extern struct xenbus_event * volatile xenbus_events;
    9.23 +extern xenbus_event_queue xenbus_events;
    9.24  #define xenbus_watch_path(xbt, path) xenbus_watch_path_token(xbt, path, XENBUS_WATCH_PATH_TOKEN, NULL)
    9.25  #define xenbus_unwatch_path(xbt, path) xenbus_unwatch_path_token(xbt, path, XENBUS_WATCH_PATH_TOKEN)
    9.26  
    10.1 --- a/extras/mini-os/netfront.c	Fri Apr 25 20:13:52 2008 +0900
    10.2 +++ b/extras/mini-os/netfront.c	Thu May 08 18:40:07 2008 +0900
    10.3 @@ -53,6 +53,8 @@ struct netfront_dev {
    10.4      char *nodename;
    10.5      char *backend;
    10.6  
    10.7 +    xenbus_event_queue events;
    10.8 +
    10.9  #ifdef HAVE_LIBC
   10.10      int fd;
   10.11      unsigned char *data;
   10.12 @@ -328,6 +330,8 @@ struct netfront_dev *init_netfront(char 
   10.13  
   10.14      dev->netif_rx = thenetif_rx;
   10.15  
   10.16 +    dev->events = NULL;
   10.17 +
   10.18      // FIXME: proper frees on failures
   10.19  again:
   10.20      err = xenbus_transaction_start(&xbt);
   10.21 @@ -399,11 +403,9 @@ done:
   10.22          char path[strlen(dev->backend) + 1 + 5 + 1];
   10.23          snprintf(path, sizeof(path), "%s/state", dev->backend);
   10.24  
   10.25 -        xenbus_watch_path(XBT_NIL, path);
   10.26 +        xenbus_watch_path_token(XBT_NIL, path, path, &dev->events);
   10.27  
   10.28 -        xenbus_wait_for_value(path,"4");
   10.29 -
   10.30 -        xenbus_unwatch_path(XBT_NIL, path);
   10.31 +        xenbus_wait_for_value(path, "4", &dev->events);
   10.32  
   10.33          if (ip) {
   10.34              snprintf(path, sizeof(path), "%s/ip", dev->backend);
   10.35 @@ -458,10 +460,12 @@ void shutdown_netfront(struct netfront_d
   10.36  
   10.37      snprintf(path, sizeof(path), "%s/state", dev->backend);
   10.38      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 5); /* closing */
   10.39 -    xenbus_wait_for_value(path,"5");
   10.40 +    xenbus_wait_for_value(path, "5", &dev->events);
   10.41  
   10.42      err = xenbus_printf(XBT_NIL, nodename, "state", "%u", 6);
   10.43 -    xenbus_wait_for_value(path,"6");
   10.44 +    xenbus_wait_for_value(path, "6", &dev->events);
   10.45 +
   10.46 +    xenbus_unwatch_path(XBT_NIL, path);
   10.47  
   10.48      unbind_evtchn(dev->evtchn);
   10.49  
    11.1 --- a/extras/mini-os/xenbus/xenbus.c	Fri Apr 25 20:13:52 2008 +0900
    11.2 +++ b/extras/mini-os/xenbus/xenbus.c	Thu May 08 18:40:07 2008 +0900
    11.3 @@ -45,10 +45,10 @@ static struct xenstore_domain_interface 
    11.4  static DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
    11.5  DECLARE_WAIT_QUEUE_HEAD(xenbus_watch_queue);
    11.6  
    11.7 -struct xenbus_event *volatile xenbus_events;
    11.8 +xenbus_event_queue xenbus_events;
    11.9  static struct watch {
   11.10      char *token;
   11.11 -    struct xenbus_event *volatile *events;
   11.12 +    xenbus_event_queue *events;
   11.13      struct watch *next;
   11.14  } *watches;
   11.15  struct xenbus_req_info 
   11.16 @@ -75,28 +75,34 @@ static void memcpy_from_ring(const void 
   11.17      memcpy(dest + c1, ring, c2);
   11.18  }
   11.19  
   11.20 -char **xenbus_wait_for_watch_return()
   11.21 +char **xenbus_wait_for_watch_return(xenbus_event_queue *queue)
   11.22  {
   11.23      struct xenbus_event *event;
   11.24 +    if (!queue)
   11.25 +        queue = &xenbus_events;
   11.26      DEFINE_WAIT(w);
   11.27 -    while (!(event = xenbus_events)) {
   11.28 +    while (!(event = *queue)) {
   11.29          add_waiter(w, xenbus_watch_queue);
   11.30          schedule();
   11.31      }
   11.32      remove_waiter(w);
   11.33 -    xenbus_events = event->next;
   11.34 +    *queue = event->next;
   11.35      return &event->path;
   11.36  }
   11.37  
   11.38 -void xenbus_wait_for_watch(void)
   11.39 +void xenbus_wait_for_watch(xenbus_event_queue *queue)
   11.40  {
   11.41      char **ret;
   11.42 -    ret = xenbus_wait_for_watch_return();
   11.43 +    if (!queue)
   11.44 +        queue = &xenbus_events;
   11.45 +    ret = xenbus_wait_for_watch_return(queue);
   11.46      free(ret);
   11.47  }
   11.48  
   11.49 -char* xenbus_wait_for_value(const char* path, const char* value)
   11.50 +char* xenbus_wait_for_value(const char* path, const char* value, xenbus_event_queue *queue)
   11.51  {
   11.52 +    if (!queue)
   11.53 +        queue = &xenbus_events;
   11.54      for(;;)
   11.55      {
   11.56          char *res, *msg;
   11.57 @@ -109,7 +115,7 @@ char* xenbus_wait_for_value(const char* 
   11.58          free(res);
   11.59  
   11.60          if(r==0) break;
   11.61 -        else xenbus_wait_for_watch();
   11.62 +        else xenbus_wait_for_watch(queue);
   11.63      }
   11.64      return NULL;
   11.65  }
   11.66 @@ -147,8 +153,8 @@ static void xenbus_thread_func(void *ign
   11.67  
   11.68              if(msg.type == XS_WATCH_EVENT)
   11.69              {
   11.70 -		struct xenbus_event *event = malloc(sizeof(*event) + msg.len),
   11.71 -                                    *volatile *events = NULL;
   11.72 +		struct xenbus_event *event = malloc(sizeof(*event) + msg.len);
   11.73 +                xenbus_event_queue *events = NULL;
   11.74  		char *data = (char*)event + sizeof(*event);
   11.75                  struct watch *watch;
   11.76  
   11.77 @@ -167,8 +173,6 @@ static void xenbus_thread_func(void *ign
   11.78                          events = watch->events;
   11.79                          break;
   11.80                      }
   11.81 -                if (!events)
   11.82 -                    events = &xenbus_events;
   11.83  
   11.84  		event->next = *events;
   11.85  		*events = event;
   11.86 @@ -463,7 +467,7 @@ char *xenbus_write(xenbus_transaction_t 
   11.87      return NULL;
   11.88  }
   11.89  
   11.90 -char* xenbus_watch_path_token( xenbus_transaction_t xbt, const char *path, const char *token, struct xenbus_event *volatile *events)
   11.91 +char* xenbus_watch_path_token( xenbus_transaction_t xbt, const char *path, const char *token, xenbus_event_queue *events)
   11.92  {
   11.93      struct xsd_sockmsg *rep;
   11.94  
   11.95 @@ -474,6 +478,9 @@ char* xenbus_watch_path_token( xenbus_tr
   11.96  
   11.97      struct watch *watch = malloc(sizeof(*watch));
   11.98  
   11.99 +    if (!events)
  11.100 +        events = &xenbus_events;
  11.101 +
  11.102      watch->token = strdup(token);
  11.103      watch->events = events;
  11.104      watch->next = watches;
    12.1 --- a/tools/examples/xend-config-xenapi.sxp	Fri Apr 25 20:13:52 2008 +0900
    12.2 +++ b/tools/examples/xend-config-xenapi.sxp	Thu May 08 18:40:07 2008 +0900
    12.3 @@ -66,9 +66,9 @@
    12.4  
    12.5  
    12.6  # Address and port xend should use for the legacy TCP XMLRPC interface, 
    12.7 -# if xen-tcp-xmlrpc-server is set.
    12.8 -#(xen-tcp-xmlrpc-server-address 'localhost')
    12.9 -#(xen-tcp-xmlrpc-server-port 8006)
   12.10 +# if xend-tcp-xmlrpc-server is set.
   12.11 +#(xend-tcp-xmlrpc-server-address 'localhost')
   12.12 +#(xend-tcp-xmlrpc-server-port 8006)
   12.13  
   12.14  # SSL key and certificate to use for the legacy TCP XMLRPC interface.
   12.15  # Setting these will mean that this port serves only SSL connections as
    13.1 --- a/tools/examples/xend-config.sxp	Fri Apr 25 20:13:52 2008 +0900
    13.2 +++ b/tools/examples/xend-config.sxp	Thu May 08 18:40:07 2008 +0900
    13.3 @@ -64,9 +64,9 @@
    13.4  
    13.5  
    13.6  # Address and port xend should use for the legacy TCP XMLRPC interface, 
    13.7 -# if xen-tcp-xmlrpc-server is set.
    13.8 -#(xen-tcp-xmlrpc-server-address 'localhost')
    13.9 -#(xen-tcp-xmlrpc-server-port 8006)
   13.10 +# if xend-tcp-xmlrpc-server is set.
   13.11 +#(xend-tcp-xmlrpc-server-address 'localhost')
   13.12 +#(xend-tcp-xmlrpc-server-port 8006)
   13.13  
   13.14  # SSL key and certificate to use for the legacy TCP XMLRPC interface.
   13.15  # Setting these will mean that this port serves only SSL connections as
   13.16 @@ -82,6 +82,15 @@
   13.17  # is set.
   13.18  #(xend-relocation-port 8002)
   13.19  
   13.20 +# Whether to use tls when relocating.
   13.21 +#(xend-relocation-tls no)
   13.22 +
   13.23 +# SSL key and certificate to use for the relocation interface.
   13.24 +# Setting these will mean that this port serves only SSL connections as
   13.25 +# opposed to plaintext ones.
   13.26 +#(xend-relocation-server-ssl-key-file  /etc/xen/xmlrpc.key)
   13.27 +#(xend-relocation-server-ssl-cert-file  /etc/xen/xmlrpc.crt)
   13.28 +
   13.29  # Address xend should listen on for HTTP connections, if xend-http-server is
   13.30  # set.
   13.31  # Specifying 'localhost' prevents remote connections.
    14.1 --- a/tools/examples/xmexample.hvm	Fri Apr 25 20:13:52 2008 +0900
    14.2 +++ b/tools/examples/xmexample.hvm	Thu May 08 18:40:07 2008 +0900
    14.3 @@ -219,3 +219,28 @@ serial='pty'
    14.4  #-----------------------------------------------------------------------------
    14.5  #   Set keyboard layout, default is en-us keyboard. 
    14.6  #keymap='ja'
    14.7 +
    14.8 +#-----------------------------------------------------------------------------
    14.9 +#   Configure guest CPUID responses:
   14.10 +#cpuid=[ '1:ecx=xxxxxxxxxxxxxxxxxxxxxxxxxx1xxxxx,
   14.11 +#           eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]
   14.12 +# - Set the VMX feature flag in the guest (CPUID_1:ECX:5)
   14.13 +# - Default behaviour for all other bits in ECX And EAX registers.
   14.14 +# 
   14.15 +# Each successive character represent a lesser-significant bit:
   14.16 +#  '1' -> force the corresponding bit to 1
   14.17 +#  '0' -> force to 0
   14.18 +#  'x' -> we don't care (default behaviour)
   14.19 +#  'k' -> pass through the host bit value
   14.20 +#  's' -> as 'k' but preserve across save/restore and migration
   14.21 +#
   14.22 +#   Configure host CPUID consistency checks, which must be satisfied for this
   14.23 +#   VM to be allowed to run on this host's processor type:
   14.24 +#cpuid_check=[ '1:ecx=xxxxxxxxxxxxxxxxxxxxxxxxxx1xxxxx' ]
   14.25 +# - Host must have VMX feature flag set
   14.26 +#
   14.27 +# The format is similar to the above for 'cpuid':
   14.28 +#  '1' -> the bit must be '1'
   14.29 +#  '0' -> the bit must be '0'
   14.30 +#  'x' -> we don't care (do not check)
   14.31 +#  's' -> the bit must be the same as on the host that started this VM
    15.1 --- a/tools/ioemu/Makefile.target	Fri Apr 25 20:13:52 2008 +0900
    15.2 +++ b/tools/ioemu/Makefile.target	Thu May 08 18:40:07 2008 +0900
    15.3 @@ -370,7 +370,7 @@ endif
    15.4  
    15.5  ifdef CONFIG_PASSTHROUGH
    15.6  LIBS+=-lpci
    15.7 -VL_OBJS+= pass-through.o
    15.8 +VL_OBJS+= pass-through.o pt-msi.o
    15.9  CFLAGS += -DCONFIG_PASSTHROUGH
   15.10  $(info *** PCI passthrough capability has been enabled ***)
   15.11  endif
    16.1 --- a/tools/ioemu/hw/cirrus_vga.c	Fri Apr 25 20:13:52 2008 +0900
    16.2 +++ b/tools/ioemu/hw/cirrus_vga.c	Thu May 08 18:40:07 2008 +0900
    16.3 @@ -234,8 +234,6 @@ typedef struct CirrusVGAState {
    16.4      int cirrus_linear_io_addr;
    16.5      int cirrus_linear_bitblt_io_addr;
    16.6      int cirrus_mmio_io_addr;
    16.7 -    unsigned long cirrus_lfb_addr;
    16.8 -    unsigned long cirrus_lfb_end;
    16.9      uint32_t cirrus_addr_mask;
   16.10      uint32_t linear_mmio_mask;
   16.11      uint8_t cirrus_shadow_gr0;
   16.12 @@ -2657,11 +2655,11 @@ static void cirrus_update_memory_access(
   16.13          
   16.14  	mode = s->gr[0x05] & 0x7;
   16.15  	if (mode < 4 || mode > 5 || ((s->gr[0x0B] & 0x4) == 0)) {
   16.16 -            if (s->cirrus_lfb_addr && s->cirrus_lfb_end && !s->map_addr) {
   16.17 +            if (s->lfb_addr && s->lfb_end && !s->map_addr) {
   16.18                  void *vram_pointer, *old_vram;
   16.19  
   16.20 -                vram_pointer = set_vram_mapping(s->cirrus_lfb_addr,
   16.21 -                                                s->cirrus_lfb_end);
   16.22 +                vram_pointer = set_vram_mapping(s->lfb_addr,
   16.23 +                                                s->lfb_end);
   16.24                  if (!vram_pointer)
   16.25                      fprintf(stderr, "NULL vram_pointer\n");
   16.26                  else {
   16.27 @@ -2669,21 +2667,21 @@ static void cirrus_update_memory_access(
   16.28                                                 VGA_RAM_SIZE);
   16.29                      qemu_free(old_vram);
   16.30                  }
   16.31 -                s->map_addr = s->cirrus_lfb_addr;
   16.32 -                s->map_end = s->cirrus_lfb_end;
   16.33 +                s->map_addr = s->lfb_addr;
   16.34 +                s->map_end = s->lfb_end;
   16.35              }
   16.36              s->cirrus_linear_write[0] = cirrus_linear_mem_writeb;
   16.37              s->cirrus_linear_write[1] = cirrus_linear_mem_writew;
   16.38              s->cirrus_linear_write[2] = cirrus_linear_mem_writel;
   16.39          } else {
   16.40          generic_io:
   16.41 -            if (s->cirrus_lfb_addr && s->cirrus_lfb_end && s->map_addr) {
   16.42 +            if (s->lfb_addr && s->lfb_end && s->map_addr) {
   16.43                  void *old_vram;
   16.44  
   16.45                  old_vram = vga_update_vram((VGAState *)s, NULL, VGA_RAM_SIZE);
   16.46  
   16.47 -                unset_vram_mapping(s->cirrus_lfb_addr,
   16.48 -                                   s->cirrus_lfb_end, 
   16.49 +                unset_vram_mapping(s->lfb_addr,
   16.50 +                                   s->lfb_end, 
   16.51                                     old_vram);
   16.52  
   16.53                  s->map_addr = s->map_end = 0;
   16.54 @@ -3049,27 +3047,27 @@ void cirrus_stop_acc(CirrusVGAState *s)
   16.55      if (s->map_addr){
   16.56          int error;
   16.57          s->map_addr = 0;
   16.58 -        error = unset_vram_mapping(s->cirrus_lfb_addr,
   16.59 -                s->cirrus_lfb_end, s->vram_ptr);
   16.60 +        error = unset_vram_mapping(s->lfb_addr,
   16.61 +                s->lfb_end, s->vram_ptr);
   16.62          fprintf(stderr, "cirrus_stop_acc:unset_vram_mapping.\n");
   16.63      }
   16.64  }
   16.65  
   16.66  void cirrus_restart_acc(CirrusVGAState *s)
   16.67  {
   16.68 -    if (s->cirrus_lfb_addr && s->cirrus_lfb_end) {
   16.69 +    if (s->lfb_addr && s->lfb_end) {
   16.70          void *vram_pointer, *old_vram;
   16.71          fprintf(stderr, "cirrus_vga_load:re-enable vga acc.lfb_addr=0x%lx, lfb_end=0x%lx.\n",
   16.72 -                s->cirrus_lfb_addr, s->cirrus_lfb_end);
   16.73 -        vram_pointer = set_vram_mapping(s->cirrus_lfb_addr ,s->cirrus_lfb_end);
   16.74 +                s->lfb_addr, s->lfb_end);
   16.75 +        vram_pointer = set_vram_mapping(s->lfb_addr ,s->lfb_end);
   16.76          if (!vram_pointer){
   16.77              fprintf(stderr, "cirrus_vga_load:NULL vram_pointer\n");
   16.78          } else {
   16.79              old_vram = vga_update_vram((VGAState *)s, vram_pointer,
   16.80                      VGA_RAM_SIZE);
   16.81              qemu_free(old_vram);
   16.82 -            s->map_addr = s->cirrus_lfb_addr;
   16.83 -            s->map_end = s->cirrus_lfb_end;
   16.84 +            s->map_addr = s->lfb_addr;
   16.85 +            s->map_end = s->lfb_end;
   16.86          }
   16.87      }
   16.88  }
   16.89 @@ -3120,8 +3118,8 @@ static void cirrus_vga_save(QEMUFile *f,
   16.90  
   16.91      vga_acc = (!!s->map_addr);
   16.92      qemu_put_8s(f, &vga_acc);
   16.93 -    qemu_put_be64s(f, (uint64_t*)&s->cirrus_lfb_addr);
   16.94 -    qemu_put_be64s(f, (uint64_t*)&s->cirrus_lfb_end);
   16.95 +    qemu_put_be64s(f, (uint64_t*)&s->lfb_addr);
   16.96 +    qemu_put_be64s(f, (uint64_t*)&s->lfb_end);
   16.97      qemu_put_buffer(f, s->vram_ptr, VGA_RAM_SIZE); 
   16.98  }
   16.99  
  16.100 @@ -3175,8 +3173,8 @@ static int cirrus_vga_load(QEMUFile *f, 
  16.101      qemu_get_be32s(f, &s->hw_cursor_y);
  16.102  
  16.103      qemu_get_8s(f, &vga_acc);
  16.104 -    qemu_get_be64s(f, (uint64_t*)&s->cirrus_lfb_addr);
  16.105 -    qemu_get_be64s(f, (uint64_t*)&s->cirrus_lfb_end);
  16.106 +    qemu_get_be64s(f, (uint64_t*)&s->lfb_addr);
  16.107 +    qemu_get_be64s(f, (uint64_t*)&s->lfb_end);
  16.108      qemu_get_buffer(f, s->vram_ptr, VGA_RAM_SIZE); 
  16.109      if (vga_acc){
  16.110          cirrus_restart_acc(s);
  16.111 @@ -3337,11 +3335,11 @@ static void cirrus_pci_lfb_map(PCIDevice
  16.112      /* XXX: add byte swapping apertures */
  16.113      cpu_register_physical_memory(addr, s->vram_size,
  16.114  				 s->cirrus_linear_io_addr);
  16.115 -    s->cirrus_lfb_addr = addr;
  16.116 -    s->cirrus_lfb_end = addr + VGA_RAM_SIZE;
  16.117 -
  16.118 -    if (s->map_addr && (s->cirrus_lfb_addr != s->map_addr) &&
  16.119 -        (s->cirrus_lfb_end != s->map_end))
  16.120 +    s->lfb_addr = addr;
  16.121 +    s->lfb_end = addr + VGA_RAM_SIZE;
  16.122 +
  16.123 +    if (s->map_addr && (s->lfb_addr != s->map_addr) &&
  16.124 +        (s->lfb_end != s->map_end))
  16.125          fprintf(logfile, "cirrus vga map change while on lfb mode\n");
  16.126  
  16.127      cpu_register_physical_memory(addr + 0x1000000, 0x400000,
    17.1 --- a/tools/ioemu/hw/pass-through.c	Fri Apr 25 20:13:52 2008 +0900
    17.2 +++ b/tools/ioemu/hw/pass-through.c	Thu May 08 18:40:07 2008 +0900
    17.3 @@ -26,6 +26,7 @@
    17.4  #include "pass-through.h"
    17.5  #include "pci/header.h"
    17.6  #include "pci/pci.h"
    17.7 +#include "pt-msi.h"
    17.8  
    17.9  extern FILE *logfile;
   17.10  
   17.11 @@ -287,6 +288,9 @@ static void pt_pci_write_config(PCIDevic
   17.12          return;
   17.13      }
   17.14  
   17.15 +    if ( pt_msi_write(assigned_device, address, val, len) )
   17.16 +        return;
   17.17 +
   17.18      /* PCI config pass-through */
   17.19      if (address == 0x4) {
   17.20          switch (len){
   17.21 @@ -333,6 +337,7 @@ static uint32_t pt_pci_read_config(PCIDe
   17.22          break;
   17.23      }
   17.24  
   17.25 +    pt_msi_read(assigned_device, address, len, &val);
   17.26  exit:
   17.27  
   17.28  #ifdef PT_DEBUG_PCI_CONFIG_ACCESS
   17.29 @@ -445,11 +450,41 @@ static int pt_unregister_regions(struct 
   17.30  
   17.31  }
   17.32  
   17.33 +uint8_t find_cap_offset(struct pci_dev *pci_dev, uint8_t cap)
   17.34 +{
   17.35 +    int id;
   17.36 +    int max_cap = 48;
   17.37 +    int pos = PCI_CAPABILITY_LIST;
   17.38 +    int status;
   17.39 +
   17.40 +    status = pci_read_byte(pci_dev, PCI_STATUS);
   17.41 +    if ( (status & PCI_STATUS_CAP_LIST) == 0 )
   17.42 +        return 0;
   17.43 +
   17.44 +    while ( max_cap-- )
   17.45 +    {
   17.46 +        pos = pci_read_byte(pci_dev, pos);
   17.47 +        if ( pos < 0x40 )
   17.48 +            break;
   17.49 +
   17.50 +        pos &= ~3;
   17.51 +        id = pci_read_byte(pci_dev, pos + PCI_CAP_LIST_ID);
   17.52 +
   17.53 +        if ( id == 0xff )
   17.54 +            break;
   17.55 +        if ( id == cap )
   17.56 +            return pos;
   17.57 +
   17.58 +        pos += PCI_CAP_LIST_NEXT;
   17.59 +    }
   17.60 +    return 0;
   17.61 +}
   17.62 +
   17.63  struct pt_dev * register_real_device(PCIBus *e_bus,
   17.64          const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev,
   17.65          uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access)
   17.66  {
   17.67 -    int rc = -1, i;
   17.68 +    int rc = -1, i, pos;
   17.69      struct pt_dev *assigned_device = NULL;
   17.70      struct pci_dev *pci_dev;
   17.71      uint8_t e_device, e_intx;
   17.72 @@ -511,6 +546,9 @@ struct pt_dev * register_real_device(PCI
   17.73      for ( i = 0; i < PCI_CONFIG_SIZE; i++ )
   17.74          assigned_device->dev.config[i] = pci_read_byte(pci_dev, i);
   17.75  
   17.76 +    if ( (pos = find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) )
   17.77 +        pt_msi_init(assigned_device, pos);
   17.78 +
   17.79      /* Handle real device's MMIO/PIO BARs */
   17.80      pt_register_regions(assigned_device);
   17.81  
   17.82 @@ -519,7 +557,21 @@ struct pt_dev * register_real_device(PCI
   17.83      e_intx = assigned_device->dev.config[0x3d]-1;
   17.84  
   17.85      if ( PT_MACHINE_IRQ_AUTO == machine_irq )
   17.86 +    {
   17.87 +        int pirq = pci_dev->irq;
   17.88 +
   17.89          machine_irq = pci_dev->irq;
   17.90 +        rc = xc_physdev_map_pirq(xc_handle, domid, MAP_PIRQ_TYPE_GSI,
   17.91 +                                machine_irq, &pirq);
   17.92 +
   17.93 +        if ( rc )
   17.94 +        {
   17.95 +            /* TBD: unregister device in case of an error */
   17.96 +            PT_LOG("Error: Mapping irq failed, rc = %d\n", rc);
   17.97 +        }
   17.98 +        else
   17.99 +            machine_irq = pirq;
  17.100 +    }
  17.101  
  17.102      /* bind machine_irq to device */
  17.103      if ( 0 != machine_irq )
    18.1 --- a/tools/ioemu/hw/pass-through.h	Fri Apr 25 20:13:52 2008 +0900
    18.2 +++ b/tools/ioemu/hw/pass-through.h	Thu May 08 18:40:07 2008 +0900
    18.3 @@ -57,6 +57,14 @@ struct pt_region {
    18.4      } access;
    18.5  };
    18.6  
    18.7 +struct pt_msi_info {
    18.8 +    uint32_t flags;
    18.9 +    int offset;
   18.10 +    int size;
   18.11 +    int pvec;   /* physical vector used */
   18.12 +    int pirq;  /* guest pirq corresponding */
   18.13 +};
   18.14 +
   18.15  /*
   18.16      This structure holds the context of the mapping functions
   18.17      and data that is relevant for qemu device management.
   18.18 @@ -65,6 +73,7 @@ struct pt_dev {
   18.19      PCIDevice dev;
   18.20      struct pci_dev *pci_dev;                     /* libpci struct */
   18.21      struct pt_region bases[PCI_NUM_REGIONS];    /* Access regions */
   18.22 +    struct pt_msi_info *msi;                    /* MSI virtualization */
   18.23  };
   18.24  
   18.25  /* Used for formatting PCI BDF into cf8 format */
    19.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    19.2 +++ b/tools/ioemu/hw/pt-msi.c	Thu May 08 18:40:07 2008 +0900
    19.3 @@ -0,0 +1,488 @@
    19.4 +/*
    19.5 + * Copyright (c) 2007, Intel Corporation.
    19.6 + *
    19.7 + * This program is free software; you can redistribute it and/or modify it
    19.8 + * under the terms and conditions of the GNU General Public License,
    19.9 + * version 2, as published by the Free Software Foundation.
   19.10 + *
   19.11 + * This program is distributed in the hope it will be useful, but WITHOUT
   19.12 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
   19.13 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
   19.14 + * more details.
   19.15 + *
   19.16 + * You should have received a copy of the GNU General Public License along with
   19.17 + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
   19.18 + * Place - Suite 330, Boston, MA 02111-1307 USA.
   19.19 + *
   19.20 + * Jiang Yunhong <yunhong.jiang@intel.com>
   19.21 + *
   19.22 + * This file implements direct PCI assignment to a HVM guest
   19.23 + */
   19.24 +
   19.25 +#include "pt-msi.h"
   19.26 +
   19.27 +#define PT_MSI_CTRL_WR_MASK_HI      (0x1)
   19.28 +#define PT_MSI_CTRL_WR_MASK_LO      (0x8E)
   19.29 +#define PT_MSI_DATA_WR_MASK         (0x38)
   19.30 +int pt_msi_init(struct pt_dev *dev, int pos)
   19.31 +{
   19.32 +    uint8_t id;
   19.33 +    uint16_t flags;
   19.34 +    struct pci_dev *pd = dev->pci_dev;
   19.35 +    PCIDevice *d = (struct PCIDevice *)dev;
   19.36 +
   19.37 +    id = pci_read_byte(pd, pos + PCI_CAP_LIST_ID);
   19.38 +
   19.39 +    if ( id != PCI_CAP_ID_MSI )
   19.40 +    {
   19.41 +        PT_LOG("pt_msi_init: error id %x pos %x\n", id, pos);
   19.42 +        return -1;
   19.43 +    }
   19.44 +
   19.45 +    dev->msi = malloc(sizeof(struct pt_msi_info));
   19.46 +    if ( !dev->msi )
   19.47 +    {
   19.48 +        PT_LOG("pt_msi_init: error allocation pt_msi_info\n");
   19.49 +        return -1;
   19.50 +    }
   19.51 +    memset(dev->msi, 0, sizeof(struct pt_msi_info));
   19.52 +
   19.53 +    dev->msi->offset = pos;
   19.54 +    dev->msi->size = 0xa;
   19.55 +
   19.56 +    flags = pci_read_byte(pd, pos + PCI_MSI_FLAGS);
   19.57 +    if ( flags & PCI_MSI_FLAGS_ENABLE )
   19.58 +    {
   19.59 +        PT_LOG("pt_msi_init: MSI enabled already, disable first\n");
   19.60 +        pci_write_byte(pd, pos + PCI_MSI_FLAGS, flags & ~PCI_MSI_FLAGS_ENABLE);
   19.61 +    }
   19.62 +    dev->msi->flags |= (flags | MSI_FLAG_UNINIT);
   19.63 +
   19.64 +    if ( flags & PCI_MSI_FLAGS_64BIT )
   19.65 +        dev->msi->size += 4;
   19.66 +    if ( flags & PCI_MSI_FLAGS_PVMASK )
   19.67 +        dev->msi->size += 10;
   19.68 +
   19.69 +    /* All register is 0 after reset, except first 4 byte */
   19.70 +    *(uint32_t *)(&d->config[pos]) = pci_read_long(pd, pos);
   19.71 +    d->config[pos + 2] &=  PT_MSI_CTRL_WR_MASK_LO;
   19.72 +    d->config[pos + 3] &=  PT_MSI_CTRL_WR_MASK_HI;
   19.73 +
   19.74 +    return 0;
   19.75 +}
   19.76 +
   19.77 +/*
   19.78 + * setup physical msi, but didn't enable it
   19.79 + */
   19.80 +static int pt_msi_setup(struct pt_dev *dev)
   19.81 +{
   19.82 +    int vector = -1, pirq = -1;
   19.83 +
   19.84 +    if ( !(dev->msi->flags & MSI_FLAG_UNINIT) )
   19.85 +    {
   19.86 +        PT_LOG("setup physical after initialized?? \n");
   19.87 +        return -1;
   19.88 +    }
   19.89 +
   19.90 +    if ( xc_physdev_map_pirq_msi(xc_handle, domid, MAP_PIRQ_TYPE_MSI,
   19.91 +                            vector, &pirq,
   19.92 +							dev->pci_dev->dev << 3 | dev->pci_dev->func,
   19.93 +							dev->pci_dev->bus, 1) )
   19.94 +    {
   19.95 +        PT_LOG("error map vector %x\n", vector);
   19.96 +        return -1;
   19.97 +    }
   19.98 +    dev->msi->pirq = pirq;
   19.99 +    PT_LOG("vector %x pirq %x\n", vector, pirq);
  19.100 +
  19.101 +    return 0;
  19.102 +}
  19.103 +
  19.104 +/*
  19.105 + * caller should make sure mask is supported
  19.106 + */
  19.107 +static uint32_t get_msi_gmask(struct pt_dev *d)
  19.108 +{
  19.109 +    struct PCIDevice *pd = (struct PCIDevice *)d;
  19.110 +
  19.111 +    if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.112 +        return *(uint32_t *)(pd->config + d->msi->offset + 0xc);
  19.113 +    else
  19.114 +        return *(uint32_t *)(pd->config + d->msi->offset + 0x10);
  19.115 +
  19.116 +}
  19.117 +
  19.118 +static uint16_t get_msi_gdata(struct pt_dev *d)
  19.119 +{
  19.120 +    struct PCIDevice *pd = (struct PCIDevice *)d;
  19.121 +
  19.122 +    if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.123 +        return *(uint16_t *)(pd->config + d->msi->offset + PCI_MSI_DATA_64);
  19.124 +    else
  19.125 +        return *(uint16_t *)(pd->config + d->msi->offset + PCI_MSI_DATA_32);
  19.126 +}
  19.127 +
  19.128 +static uint64_t get_msi_gaddr(struct pt_dev *d)
  19.129 +{
  19.130 +    struct PCIDevice *pd = (struct PCIDevice *)d;
  19.131 +    uint32_t addr_hi;
  19.132 +    uint64_t addr = 0;
  19.133 +
  19.134 +    addr =(uint64_t)(*(uint32_t *)(pd->config +
  19.135 +                     d->msi->offset + PCI_MSI_ADDRESS_LO));
  19.136 +
  19.137 +    if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.138 +    {
  19.139 +        addr_hi = *(uint32_t *)(pd->config + d->msi->offset
  19.140 +                                + PCI_MSI_ADDRESS_HI);
  19.141 +        addr |= (uint64_t)addr_hi << 32;
  19.142 +    }
  19.143 +    return addr;
  19.144 +}
  19.145 +
  19.146 +static uint8_t get_msi_gctrl(struct pt_dev *d)
  19.147 +{
  19.148 +    struct PCIDevice *pd = (struct PCIDevice *)d;
  19.149 +
  19.150 +    return  *(uint8_t *)(pd->config + d->msi->offset + PCI_MSI_FLAGS);
  19.151 +}
  19.152 +
  19.153 +static uint32_t get_msi_gflags(struct pt_dev *d)
  19.154 +{
  19.155 +    uint32_t result = 0;
  19.156 +    int rh, dm, dest_id, deliv_mode, trig_mode;
  19.157 +    uint16_t data;
  19.158 +    uint64_t addr;
  19.159 +
  19.160 +    data = get_msi_gdata(d);
  19.161 +    addr = get_msi_gaddr(d);
  19.162 +
  19.163 +    rh = (addr >> MSI_ADDR_REDIRECTION_SHIFT) & 0x1;
  19.164 +    dm = (addr >> MSI_ADDR_DESTMODE_SHIFT) & 0x1;
  19.165 +    dest_id = (addr >> MSI_TARGET_CPU_SHIFT) & 0xff;
  19.166 +    deliv_mode = (data >> MSI_DATA_DELIVERY_SHIFT) & 0x7;
  19.167 +    trig_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
  19.168 +
  19.169 +    result |= dest_id | (rh << GFLAGS_SHIFT_RH) | (dm << GFLAGS_SHIFT_DM) | \
  19.170 +                (deliv_mode << GLFAGS_SHIFT_DELIV_MODE) |
  19.171 +                (trig_mode << GLFAGS_SHIFT_TRG_MODE);
  19.172 +
  19.173 +    return result;
  19.174 +}
  19.175 +
  19.176 +/*
  19.177 + * This may be arch different
  19.178 + */
  19.179 +static inline uint8_t get_msi_gvec(struct pt_dev *d)
  19.180 +{
  19.181 +    return get_msi_gdata(d) & 0xff;
  19.182 +}
  19.183 +
  19.184 +static inline uint8_t get_msi_hvec(struct pt_dev *d)
  19.185 +{
  19.186 +    struct pci_dev *pd = d->pci_dev;
  19.187 +    uint16_t data;
  19.188 +
  19.189 +    if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.190 +        data = pci_read_word(pd, PCI_MSI_DATA_64);
  19.191 +    else
  19.192 +        data = pci_read_word(pd, PCI_MSI_DATA_32);
  19.193 +
  19.194 +    return data & 0xff;
  19.195 +}
  19.196 +
  19.197 +/*
  19.198 + * Update msi mapping, usually called when MSI enabled,
  19.199 + * except the first time
  19.200 + */
  19.201 +static int pt_msi_update(struct pt_dev *d)
  19.202 +{
  19.203 +    PT_LOG("now update msi with pirq %x gvec %x\n",
  19.204 +            get_msi_gvec(d), d->msi->pirq);
  19.205 +    return xc_domain_update_msi_irq(xc_handle, domid, get_msi_gvec(d),
  19.206 +                                     d->msi->pirq, get_msi_gflags(d));
  19.207 +}
  19.208 +
  19.209 +static int pt_msi_enable(struct pt_dev *d, int enable)
  19.210 +{
  19.211 +    uint16_t ctrl;
  19.212 +    struct pci_dev *pd = d->pci_dev;
  19.213 +
  19.214 +    if ( !pd )
  19.215 +        return -1;
  19.216 +
  19.217 +    ctrl = pci_read_word(pd, d->msi->offset + PCI_MSI_FLAGS);
  19.218 +
  19.219 +    if ( enable )
  19.220 +        ctrl |= PCI_MSI_FLAGS_ENABLE;
  19.221 +    else
  19.222 +        ctrl &= ~PCI_MSI_FLAGS_ENABLE;
  19.223 +
  19.224 +    pci_write_word(pd, d->msi->offset + PCI_MSI_FLAGS, ctrl);
  19.225 +    return 0;
  19.226 +}
  19.227 +
  19.228 +static int pt_msi_control_update(struct pt_dev *d, uint16_t old_ctrl)
  19.229 +{
  19.230 +    uint16_t new_ctrl;
  19.231 +    PCIDevice *pd = (PCIDevice *)d;
  19.232 +
  19.233 +    new_ctrl = get_msi_gctrl(d);
  19.234 +
  19.235 +    PT_LOG("old_ctrl %x new_Ctrl %x\n", old_ctrl, new_ctrl);
  19.236 +
  19.237 +    if ( new_ctrl & PCI_MSI_FLAGS_ENABLE )
  19.238 +    {
  19.239 +        if ( d->msi->flags & MSI_FLAG_UNINIT )
  19.240 +        {
  19.241 +            /* Init physical one */
  19.242 +            PT_LOG("setup msi for dev %x\n", pd->devfn);
  19.243 +            if ( pt_msi_setup(d) )
  19.244 +            {
  19.245 +                PT_LOG("pt_msi_setup error!!!\n");
  19.246 +                return -1;
  19.247 +            }
  19.248 +            pt_msi_update(d);
  19.249 +
  19.250 +            d->msi->flags &= ~MSI_FLAG_UNINIT;
  19.251 +            d->msi->flags |= PT_MSI_MAPPED;
  19.252 +
  19.253 +            /* Enable physical MSI only after bind */
  19.254 +            pt_msi_enable(d, 1);
  19.255 +        }
  19.256 +        else if ( !(old_ctrl & PCI_MSI_FLAGS_ENABLE) )
  19.257 +            pt_msi_enable(d, 1);
  19.258 +    }
  19.259 +    else if ( old_ctrl & PCI_MSI_FLAGS_ENABLE )
  19.260 +        pt_msi_enable(d, 0);
  19.261 +
  19.262 +    /* Currently no support for multi-vector */
  19.263 +    if ( (new_ctrl & PCI_MSI_FLAGS_QSIZE) != 0x0 )
  19.264 +        PT_LOG("try to set more than 1 vector ctrl %x\n", new_ctrl);
  19.265 +
  19.266 +    return 0;
  19.267 +}
  19.268 +
  19.269 +static int
  19.270 +pt_msi_map_update(struct pt_dev *d, uint32_t old_data, uint64_t old_addr)
  19.271 +{
  19.272 +    uint16_t pctrl;
  19.273 +    uint32_t data;
  19.274 +    uint64_t addr;
  19.275 +
  19.276 +    data = get_msi_gdata(d);
  19.277 +    addr = get_msi_gaddr(d);
  19.278 +
  19.279 +    PT_LOG("old_data %x old_addr %lx data %x addr %lx\n",
  19.280 +            old_data, old_addr, data, addr);
  19.281 +
  19.282 +    if ( data != old_data || addr != old_addr )
  19.283 +        if ( get_msi_gctrl(d) & PCI_MSI_FLAGS_ENABLE )
  19.284 +            pt_msi_update(d);
  19.285 +
  19.286 +    return 0;
  19.287 +}
  19.288 +
  19.289 +static int pt_msi_mask_update(struct pt_dev *d, uint32_t old_mask)
  19.290 +{
  19.291 +    struct pci_dev *pd = d->pci_dev;
  19.292 +    uint32_t mask;
  19.293 +    int offset;
  19.294 +
  19.295 +    if ( !(d->msi->flags & PCI_MSI_FLAGS_PVMASK) )
  19.296 +        return -1;
  19.297 +
  19.298 +    mask = get_msi_gmask(d);
  19.299 +
  19.300 +    if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.301 +        offset = d->msi->offset + 0xc;
  19.302 +    else
  19.303 +        offset = d->msi->offset + 0x10;
  19.304 +
  19.305 +    if ( old_mask != mask )
  19.306 +        pci_write_long(pd, offset, mask);
  19.307 +}
  19.308 +
  19.309 +#define ACCESSED_DATA 0x2
  19.310 +#define ACCESSED_MASK 0x4
  19.311 +#define ACCESSED_ADDR 0x8
  19.312 +#define ACCESSED_CTRL 0x10
  19.313 +
  19.314 +int pt_msi_write(struct pt_dev *d, uint32_t addr, uint32_t val, uint32_t len)
  19.315 +{
  19.316 +    struct pci_dev *pd;
  19.317 +    int i, cur = addr;
  19.318 +    uint8_t value, flags = 0;
  19.319 +    uint16_t old_ctrl = 0, old_data = 0;
  19.320 +    uint32_t old_mask = 0;
  19.321 +    uint64_t old_addr = 0;
  19.322 +    PCIDevice *dev = (PCIDevice *)d;
  19.323 +    int can_write = 1;
  19.324 +
  19.325 +    if ( !d || !d->msi )
  19.326 +        return 0;
  19.327 +
  19.328 +    if ( (addr >= (d->msi->offset + d->msi->size) ) ||
  19.329 +         (addr + len) < d->msi->offset)
  19.330 +        return 0;
  19.331 +
  19.332 +    PT_LOG("addr %x val %x len %x offset %x size %x\n",
  19.333 +            addr, val, len, d->msi->offset, d->msi->size);
  19.334 +
  19.335 +    pd = d->pci_dev;
  19.336 +    old_ctrl = get_msi_gctrl(d);
  19.337 +    old_addr = get_msi_gaddr(d);
  19.338 +    old_data = get_msi_gdata(d);
  19.339 +
  19.340 +    if ( d->msi->flags & PCI_MSI_FLAGS_PVMASK )
  19.341 +        old_mask = get_msi_gmask(d);
  19.342 +
  19.343 +    for ( i = 0; i < len; i++, cur++ )
  19.344 +    {
  19.345 +        int off;
  19.346 +        uint8_t orig_value;
  19.347 +
  19.348 +        if ( cur < d->msi->offset )
  19.349 +            continue;
  19.350 +        else if ( cur >= (d->msi->offset + d->msi->size) )
  19.351 +            break;
  19.352 +
  19.353 +        off = cur - d->msi->offset;
  19.354 +        value = (val >> (i * 8)) & 0xff;
  19.355 +
  19.356 +        switch ( off )
  19.357 +        {
  19.358 +            case 0x0 ... 0x1:
  19.359 +                can_write = 0;
  19.360 +                break;
  19.361 +            case 0x2:
  19.362 +            case 0x3:
  19.363 +                flags |= ACCESSED_CTRL;
  19.364 +
  19.365 +                orig_value = pci_read_byte(pd, d->msi->offset + off);
  19.366 +
  19.367 +                orig_value &= (off == 2) ? PT_MSI_CTRL_WR_MASK_LO:
  19.368 +                                      PT_MSI_CTRL_WR_MASK_HI;
  19.369 +
  19.370 +                orig_value |= value & ( (off == 2) ? ~PT_MSI_CTRL_WR_MASK_LO:
  19.371 +                                              ~PT_MSI_CTRL_WR_MASK_HI);
  19.372 +                value = orig_value;
  19.373 +                break;
  19.374 +            case 0x4 ... 0x7:
  19.375 +                flags |= ACCESSED_ADDR;
  19.376 +                /* bit 4 ~ 11 is reserved for MSI in x86 */
  19.377 +                if ( off == 0x4 )
  19.378 +                    value &= 0x0f;
  19.379 +                if ( off == 0x5 )
  19.380 +                    value &= 0xf0;
  19.381 +                break;
  19.382 +            case 0x8 ... 0xb:
  19.383 +                if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.384 +                {
  19.385 +                    /* Up 32bit is reserved in x86 */
  19.386 +                    flags |= ACCESSED_ADDR;
  19.387 +                    if ( value )
  19.388 +                        PT_LOG("Write up32 addr with %x \n", value);
  19.389 +                }
  19.390 +                else
  19.391 +                {
  19.392 +                    if ( off == 0xa || off == 0xb )
  19.393 +                        can_write = 0;
  19.394 +                    else
  19.395 +                        flags |= ACCESSED_DATA;
  19.396 +                    if ( off == 0x9 )
  19.397 +                        value &= ~PT_MSI_DATA_WR_MASK;
  19.398 +                }
  19.399 +                break;
  19.400 +            case 0xc ... 0xf:
  19.401 +                if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.402 +                {
  19.403 +                    if ( off == 0xe || off == 0xf )
  19.404 +                        can_write = 0;
  19.405 +                    else
  19.406 +                    {
  19.407 +                        flags |= ACCESSED_DATA;
  19.408 +                        if (off == 0xd)
  19.409 +                            value &= ~PT_MSI_DATA_WR_MASK;
  19.410 +                    }
  19.411 +                }
  19.412 +                else
  19.413 +                {
  19.414 +                    if ( d->msi->flags & PCI_MSI_FLAGS_PVMASK )
  19.415 +                        flags |= ACCESSED_MASK;
  19.416 +                    else
  19.417 +                        PT_LOG("why comes to MASK without mask support??\n");
  19.418 +                }
  19.419 +                break;
  19.420 +            case 0x10 ... 0x13:
  19.421 +                if ( d->msi->flags & PCI_MSI_FLAGS_64BIT )
  19.422 +                {
  19.423 +                    if ( d->msi->flags & PCI_MSI_FLAGS_PVMASK )
  19.424 +                        flags |= ACCESSED_MASK;
  19.425 +                    else
  19.426 +                        PT_LOG("why comes to MASK without mask support??\n");
  19.427 +                }
  19.428 +                else
  19.429 +                    can_write = 0;
  19.430 +                break;
  19.431 +            case 0x14 ... 0x18:
  19.432 +                can_write = 0;
  19.433 +                break;
  19.434 +            default:
  19.435 +                PT_LOG("Non MSI register!!!\n");
  19.436 +                break;
  19.437 +        }
  19.438 +
  19.439 +        if ( can_write )
  19.440 +            dev->config[cur] = value;
  19.441 +    }
  19.442 +
  19.443 +    if ( flags & ACCESSED_DATA || flags & ACCESSED_ADDR )
  19.444 +        pt_msi_map_update(d, old_data, old_addr);
  19.445 +
  19.446 +    if ( flags & ACCESSED_MASK )
  19.447 +        pt_msi_mask_update(d, old_mask);
  19.448 +
  19.449 +    /* This will enable physical one, do it in last step */
  19.450 +    if ( flags & ACCESSED_CTRL )
  19.451 +        pt_msi_control_update(d, old_ctrl);
  19.452 +
  19.453 +    return 1;
  19.454 +}
  19.455 +
  19.456 +int pt_msi_read(struct pt_dev *d, int addr, int len, uint32_t *val)
  19.457 +{
  19.458 +    int e_addr = addr, e_len = len, offset = 0, i;
  19.459 +    uint8_t e_val = 0;
  19.460 +    PCIDevice *pd = (PCIDevice *)d;
  19.461 +
  19.462 +    if ( !d || !d->msi )
  19.463 +        return 0;
  19.464 +
  19.465 +    if ( (addr > (d->msi->offset + d->msi->size) ) ||
  19.466 +         (addr + len) <= d->msi->offset )
  19.467 +        return 0;
  19.468 +
  19.469 +    PT_LOG("pt_msi_read addr %x len %x val %x offset %x size %x\n",
  19.470 +            addr, len, *val, d->msi->offset, d->msi->size);
  19.471 +
  19.472 +    if ( (addr + len ) > (d->msi->offset + d->msi->size) )
  19.473 +        e_len -= addr + len - d->msi->offset - d->msi->size;
  19.474 +
  19.475 +    if ( addr < d->msi->offset )
  19.476 +    {
  19.477 +        e_addr = d->msi->offset;
  19.478 +        offset = d->msi->offset - addr;
  19.479 +        e_len -= offset;
  19.480 +    }
  19.481 +
  19.482 +    for ( i = 0; i < e_len; i++ )
  19.483 +    {
  19.484 +        e_val = *(uint8_t *)(&pd->config[e_addr] + i);
  19.485 +        *val &= ~(0xff << ( (offset + i) * 8));
  19.486 +        *val |= (e_val << ( (offset + i) * 8));
  19.487 +    }
  19.488 +
  19.489 +    return e_len;
  19.490 +}
  19.491 +
    20.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    20.2 +++ b/tools/ioemu/hw/pt-msi.h	Thu May 08 18:40:07 2008 +0900
    20.3 @@ -0,0 +1,65 @@
    20.4 +#ifndef _PT_MSI_H
    20.5 +#define _PT_MSI_H
    20.6 +
    20.7 +#include "vl.h"
    20.8 +#include "pci/header.h"
    20.9 +#include "pci/pci.h"
   20.10 +#include "pass-through.h"
   20.11 +
   20.12 +#define MSI_FLAG_UNINIT 0x1000
   20.13 +#define PT_MSI_MAPPED   0x2000
   20.14 +
   20.15 +#define MSI_DATA_VECTOR_SHIFT          0
   20.16 +#define     MSI_DATA_VECTOR(v)         (((u8)v) << MSI_DATA_VECTOR_SHIFT)
   20.17 +
   20.18 +#define MSI_DATA_DELIVERY_SHIFT        8
   20.19 +#define     MSI_DATA_DELIVERY_FIXED    (0 << MSI_DATA_DELIVERY_SHIFT)
   20.20 +#define     MSI_DATA_DELIVERY_LOWPRI   (1 << MSI_DATA_DELIVERY_SHIFT)
   20.21 +
   20.22 +#define MSI_DATA_LEVEL_SHIFT           14
   20.23 +#define     MSI_DATA_LEVEL_DEASSERT    (0 << MSI_DATA_LEVEL_SHIFT)
   20.24 +#define     MSI_DATA_LEVEL_ASSERT      (1 << MSI_DATA_LEVEL_SHIFT)
   20.25 +
   20.26 +#define MSI_DATA_TRIGGER_SHIFT         15
   20.27 +#define     MSI_DATA_TRIGGER_EDGE      (0 << MSI_DATA_TRIGGER_SHIFT)
   20.28 +#define     MSI_DATA_TRIGGER_LEVEL     (1 << MSI_DATA_TRIGGER_SHIFT)
   20.29 +
   20.30 +/*
   20.31 +   + * Shift/mask fields for APIC-based bus address
   20.32 +   + */
   20.33 +
   20.34 +#define MSI_ADDR_HEADER                0xfee00000
   20.35 +#define MSI_TARGET_CPU_SHIFT   	       12
   20.36 +
   20.37 +#define MSI_ADDR_DESTID_MASK           0xfff0000f
   20.38 +#define     MSI_ADDR_DESTID_CPU(cpu)   ((cpu) << MSI_TARGET_CPU_SHIFT)
   20.39 +
   20.40 +#define MSI_ADDR_DESTMODE_SHIFT        2
   20.41 +#define     MSI_ADDR_DESTMODE_PHYS     (0 << MSI_ADDR_DESTMODE_SHIFT)
   20.42 +#define        MSI_ADDR_DESTMODE_LOGIC (1 << MSI_ADDR_DESTMODE_SHIFT)
   20.43 +
   20.44 +#define MSI_ADDR_REDIRECTION_SHIFT     3
   20.45 +#define     MSI_ADDR_REDIRECTION_CPU   (0 << MSI_ADDR_REDIRECTION_SHIFT)
   20.46 +#define     MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT)
   20.47 +
   20.48 +#define PCI_MSI_FLAGS_PVMASK           0x100
   20.49 +
   20.50 +#define AUTO_ASSIGN -1
   20.51 +
   20.52 +/* shift count for gflags */
   20.53 +#define GFLAGS_SHIFT_DEST_ID        0
   20.54 +#define GFLAGS_SHIFT_RH             8
   20.55 +#define GFLAGS_SHIFT_DM             9
   20.56 +#define GLFAGS_SHIFT_DELIV_MODE     12
   20.57 +#define GLFAGS_SHIFT_TRG_MODE       15
   20.58 +
   20.59 +int
   20.60 +pt_msi_init(struct pt_dev *dev, int pos);
   20.61 +
   20.62 +int
   20.63 +pt_msi_write(struct pt_dev *d, uint32_t addr, uint32_t val, uint32_t len);
   20.64 +
   20.65 +int
   20.66 +pt_msi_read(struct pt_dev *d, int addr, int len, uint32_t *val);
   20.67 +
   20.68 +#endif
    21.1 --- a/tools/ioemu/hw/vga.c	Fri Apr 25 20:13:52 2008 +0900
    21.2 +++ b/tools/ioemu/hw/vga.c	Thu May 08 18:40:07 2008 +0900
    21.3 @@ -1075,7 +1075,7 @@ static rgb_to_pixel_dup_func *rgb_to_pix
    21.4   */
    21.5  static void vga_draw_text(VGAState *s, int full_update)
    21.6  {
    21.7 -    int cx, cy, cheight, cw, ch, cattr, height, width, ch_attr, depth;
    21.8 +    int cx, cy, cheight, cw, ch, cattr, height, width, ch_attr;
    21.9      int cx_min, cx_max, linesize, x_incr;
   21.10      uint32_t offset, fgcol, bgcol, v, cursor_offset;
   21.11      uint8_t *d1, *d, *src, *s1, *dest, *cursor_ptr;
   21.12 @@ -1086,9 +1086,11 @@ static void vga_draw_text(VGAState *s, i
   21.13      vga_draw_glyph8_func *vga_draw_glyph8;
   21.14      vga_draw_glyph9_func *vga_draw_glyph9;
   21.15  
   21.16 -    depth = s->get_bpp(s);
   21.17 -    if (s->ds->dpy_colourdepth != NULL && s->ds->depth != depth)
   21.18 -        s->ds->dpy_colourdepth(s->ds, depth);
   21.19 +    /* Disable dirty bit tracking */
   21.20 +    xc_hvm_track_dirty_vram(xc_handle, domid, 0, 0, NULL);
   21.21 +
   21.22 +    if (s->ds->dpy_colourdepth != NULL && s->ds->depth != 0)
   21.23 +        s->ds->dpy_colourdepth(s->ds, 0);
   21.24      s->rgb_to_pixel = 
   21.25          rgb_to_pixel_dup_table[get_depth_index(s->ds)];
   21.26  
   21.27 @@ -1486,7 +1488,7 @@ void check_sse2(void)
   21.28  static void vga_draw_graphic(VGAState *s, int full_update)
   21.29  {
   21.30      int y1, y, update, linesize, y_start, double_scan, mask, depth;
   21.31 -    int width, height, shift_control, line_offset, bwidth, ds_depth;
   21.32 +    int width, height, shift_control, line_offset, bwidth, ds_depth, bits;
   21.33      ram_addr_t page0, page1;
   21.34      int disp_width, multi_scan, multi_run;
   21.35      uint8_t *d;
   21.36 @@ -1534,6 +1536,7 @@ static void vga_draw_graphic(VGAState *s
   21.37          } else {
   21.38              v = VGA_DRAW_LINE4;
   21.39          }
   21.40 +        bits = 4;
   21.41      } else if (shift_control == 1) {
   21.42          full_update |= update_palette16(s);
   21.43          if (s->sr[0x01] & 8) {
   21.44 @@ -1542,28 +1545,35 @@ static void vga_draw_graphic(VGAState *s
   21.45          } else {
   21.46              v = VGA_DRAW_LINE2;
   21.47          }
   21.48 +        bits = 4;
   21.49      } else {
   21.50          switch(s->get_bpp(s)) {
   21.51          default:
   21.52          case 0:
   21.53              full_update |= update_palette256(s);
   21.54              v = VGA_DRAW_LINE8D2;
   21.55 +            bits = 4;
   21.56              break;
   21.57          case 8:
   21.58              full_update |= update_palette256(s);
   21.59              v = VGA_DRAW_LINE8;
   21.60 +            bits = 8;
   21.61              break;
   21.62          case 15:
   21.63              v = VGA_DRAW_LINE15;
   21.64 +            bits = 16;
   21.65              break;
   21.66          case 16:
   21.67              v = VGA_DRAW_LINE16;
   21.68 +            bits = 16;
   21.69              break;
   21.70          case 24:
   21.71              v = VGA_DRAW_LINE24;
   21.72 +            bits = 24;
   21.73              break;
   21.74          case 32:
   21.75              v = VGA_DRAW_LINE32;
   21.76 +            bits = 32;
   21.77              break;
   21.78          }
   21.79      }
   21.80 @@ -1591,12 +1601,72 @@ static void vga_draw_graphic(VGAState *s
   21.81             width, height, v, line_offset, s->cr[9], s->cr[0x17], s->line_compare, s->sr[0x01]);
   21.82  #endif
   21.83  
   21.84 -    for (y = 0; y < s->vram_size; y += TARGET_PAGE_SIZE)
   21.85 -        if (vram_dirty(s, y, TARGET_PAGE_SIZE))
   21.86 +    y = 0;
   21.87 +
   21.88 +    if (height - 1 > s->line_compare || multi_run || (s->cr[0x17] & 3) != 3
   21.89 +            || !s->lfb_addr) {
   21.90 +        /* Tricky things happen, disable dirty bit tracking */
   21.91 +        xc_hvm_track_dirty_vram(xc_handle, domid, 0, 0, NULL);
   21.92 +
   21.93 +        for ( ; y < s->vram_size; y += TARGET_PAGE_SIZE)
   21.94 +            if (vram_dirty(s, y, TARGET_PAGE_SIZE))
   21.95 +                cpu_physical_memory_set_dirty(s->vram_offset + y);
   21.96 +    } else {
   21.97 +        /* Tricky things won't have any effect, i.e. we are in the very simple
   21.98 +         * (and very usual) case of a linear buffer. */
   21.99 +        unsigned long end;
  21.100 +
  21.101 +        for ( ; y < ((s->start_addr * 4) & TARGET_PAGE_MASK); y += TARGET_PAGE_SIZE)
  21.102 +            /* We will not read that anyway. */
  21.103              cpu_physical_memory_set_dirty(s->vram_offset + y);
  21.104  
  21.105 +        if (y < (s->start_addr * 4)) {
  21.106 +            /* start address not aligned on a page, track dirtyness by hand. */
  21.107 +            if (vram_dirty(s, y, TARGET_PAGE_SIZE))
  21.108 +                cpu_physical_memory_set_dirty(s->vram_offset + y);
  21.109 +            y += TARGET_PAGE_SIZE;
  21.110 +        }
  21.111 +
  21.112 +        /* use page table dirty bit tracking for the inner of the LFB */
  21.113 +        end = s->start_addr * 4 + height * line_offset;
  21.114 +        {
  21.115 +            unsigned long npages = ((end & TARGET_PAGE_MASK) - y) / TARGET_PAGE_SIZE;
  21.116 +            const int width = sizeof(unsigned long) * 8;
  21.117 +            unsigned long bitmap[(npages + width - 1) / width];
  21.118 +            int err;
  21.119 +
  21.120 +            if (!(err = xc_hvm_track_dirty_vram(xc_handle, domid,
  21.121 +                        (s->lfb_addr + y) / TARGET_PAGE_SIZE, npages, bitmap))) {
  21.122 +                int i, j;
  21.123 +                for (i = 0; i < sizeof(bitmap) / sizeof(*bitmap); i++) {
  21.124 +                    unsigned long map = bitmap[i];
  21.125 +                    for (j = i * width; map && j < npages; map >>= 1, j++)
  21.126 +                        if (map & 1)
  21.127 +                            cpu_physical_memory_set_dirty(s->vram_offset + y
  21.128 +                                + j * TARGET_PAGE_SIZE);
  21.129 +                }
  21.130 +                y += npages * TARGET_PAGE_SIZE;
  21.131 +            } else {
  21.132 +                /* ENODATA just means we have changed mode and will succeed
  21.133 +                 * next time */
  21.134 +                if (err != -ENODATA)
  21.135 +                    fprintf(stderr, "track_dirty_vram(%lx, %lx) failed (%d)\n", s->lfb_addr + y, npages, err);
  21.136 +            }
  21.137 +        }
  21.138 +
  21.139 +        for ( ; y < s->vram_size && y < end; y += TARGET_PAGE_SIZE)
  21.140 +            /* failed or end address not aligned on a page, track dirtyness by
  21.141 +             * hand. */
  21.142 +            if (vram_dirty(s, y, TARGET_PAGE_SIZE))
  21.143 +                cpu_physical_memory_set_dirty(s->vram_offset + y);
  21.144 +
  21.145 +        for ( ; y < s->vram_size; y += TARGET_PAGE_SIZE)
  21.146 +            /* We will not read that anyway. */
  21.147 +            cpu_physical_memory_set_dirty(s->vram_offset + y);
  21.148 +    }
  21.149 +
  21.150      addr1 = (s->start_addr * 4);
  21.151 -    bwidth = width * 4;
  21.152 +    bwidth = (width * bits + 7) / 8;
  21.153      y_start = -1;
  21.154      page_min = 0;
  21.155      page_max = 0;
  21.156 @@ -1682,6 +1752,10 @@ static void vga_draw_blank(VGAState *s, 
  21.157          return;
  21.158      if (s->last_scr_width <= 0 || s->last_scr_height <= 0)
  21.159          return;
  21.160 +
  21.161 +    /* Disable dirty bit tracking */
  21.162 +    xc_hvm_track_dirty_vram(xc_handle, domid, 0, 0, NULL);
  21.163 +
  21.164      s->rgb_to_pixel = 
  21.165          rgb_to_pixel_dup_table[get_depth_index(s->ds)];
  21.166      if (s->ds->depth == 8) 
    22.1 --- a/tools/ioemu/hw/vga_int.h	Fri Apr 25 20:13:52 2008 +0900
    22.2 +++ b/tools/ioemu/hw/vga_int.h	Thu May 08 18:40:07 2008 +0900
    22.3 @@ -87,6 +87,8 @@
    22.4      unsigned int vram_size;                                             \
    22.5      unsigned long bios_offset;                                          \
    22.6      unsigned int bios_size;                                             \
    22.7 +    unsigned long lfb_addr;                                             \
    22.8 +    unsigned long lfb_end;                                              \
    22.9      PCIDevice *pci_dev;                                                 \
   22.10      uint32_t latch;                                                     \
   22.11      uint8_t sr_index;                                                   \
    23.1 --- a/tools/ioemu/sdl.c	Fri Apr 25 20:13:52 2008 +0900
    23.2 +++ b/tools/ioemu/sdl.c	Thu May 08 18:40:07 2008 +0900
    23.3 @@ -235,6 +235,9 @@ static void sdl_resize(DisplayState *ds,
    23.4   again:
    23.5      screen = SDL_SetVideoMode(w, h, 0, flags);
    23.6  
    23.7 +    /* Process any WM-generated resize event */
    23.8 +    SDL_PumpEvents();
    23.9 +
   23.10      if (!screen) {
   23.11          fprintf(stderr, "Could not open SDL display: %s\n", SDL_GetError());
   23.12          if (opengl_enabled) {
    24.1 --- a/tools/ioemu/vl.h	Fri Apr 25 20:13:52 2008 +0900
    24.2 +++ b/tools/ioemu/vl.h	Thu May 08 18:40:07 2008 +0900
    24.3 @@ -940,7 +940,6 @@ struct DisplayState {
    24.4      uint32_t *palette;
    24.5      uint64_t gui_timer_interval;
    24.6  
    24.7 -    int switchbpp;
    24.8      int shared_buf;
    24.9      
   24.10      void (*dpy_update)(struct DisplayState *s, int x, int y, int w, int h);
    25.1 --- a/tools/ioemu/vnc.c	Fri Apr 25 20:13:52 2008 +0900
    25.2 +++ b/tools/ioemu/vnc.c	Thu May 08 18:40:07 2008 +0900
    25.3 @@ -198,6 +198,7 @@ struct VncState
    25.4      char *x509key;
    25.5  #endif
    25.6      char challenge[VNC_AUTH_CHALLENGE_SIZE];
    25.7 +    int switchbpp;
    25.8  
    25.9  #if CONFIG_VNC_TLS
   25.10      int wiremode;
   25.11 @@ -1686,7 +1687,7 @@ static void vnc_dpy_colourdepth(DisplayS
   25.12          default:
   25.13              return;
   25.14      }
   25.15 -    if (ds->switchbpp) {
   25.16 +    if (vs->switchbpp) {
   25.17          vnc_client_error(vs);
   25.18      } else if (vs->csock != -1 && vs->has_WMVi) {
   25.19          /* Sending a WMVi message to notify the client*/
   25.20 @@ -2647,7 +2648,7 @@ int vnc_display_open(DisplayState *ds, c
   25.21  	if (strncmp(options, "password", 8) == 0) {
   25.22  	    password = 1; /* Require password auth */
   25.23          } else if (strncmp(options, "switchbpp", 9) == 0) {
   25.24 -            ds->switchbpp = 1;
   25.25 +            vs->switchbpp = 1;
   25.26  #if CONFIG_VNC_TLS
   25.27  	} else if (strncmp(options, "tls", 3) == 0) {
   25.28  	    tls = 1; /* Require TLS */
    26.1 --- a/tools/libfsimage/Makefile	Fri Apr 25 20:13:52 2008 +0900
    26.2 +++ b/tools/libfsimage/Makefile	Thu May 08 18:40:07 2008 +0900
    26.3 @@ -1,7 +1,7 @@
    26.4  XEN_ROOT = ../..
    26.5  include $(XEN_ROOT)/tools/Rules.mk
    26.6  
    26.7 -SUBDIRS-y = common ufs reiserfs iso9660 fat
    26.8 +SUBDIRS-y = common ufs reiserfs iso9660 fat zfs
    26.9  SUBDIRS-y += $(shell env CC="$(CC)" ./check-libext2fs)
   26.10  
   26.11  .PHONY: all clean install
    27.1 --- a/tools/libfsimage/common/fsimage.c	Fri Apr 25 20:13:52 2008 +0900
    27.2 +++ b/tools/libfsimage/common/fsimage.c	Thu May 08 18:40:07 2008 +0900
    27.3 @@ -17,7 +17,7 @@
    27.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    27.5   * DEALINGS IN THE SOFTWARE.
    27.6   *
    27.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    27.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    27.9   * Use is subject to license terms.
   27.10   */
   27.11  
   27.12 @@ -51,6 +51,7 @@ fsi_t *fsi_open_fsimage(const char *path
   27.13  	fsi->f_fd = fd;
   27.14  	fsi->f_off = off;
   27.15  	fsi->f_data = NULL;
   27.16 +	fsi->f_bootstring = NULL;
   27.17  
   27.18  	pthread_mutex_lock(&fsi_lock);
   27.19  	err = find_plugin(fsi, path, options);
   27.20 @@ -140,3 +141,29 @@ ssize_t fsi_pread_file(fsi_file_t *ffi, 
   27.21  
   27.22  	return (ret);
   27.23  }
   27.24 +
   27.25 +char *
   27.26 +fsi_bootstring_alloc(fsi_t *fsi, size_t len)
   27.27 +{
   27.28 +	fsi->f_bootstring = malloc(len);
   27.29 +	if (fsi->f_bootstring == NULL)
   27.30 +		return (NULL);
   27.31 +
   27.32 +	bzero(fsi->f_bootstring, len);
   27.33 +	return (fsi->f_bootstring);
   27.34 +}
   27.35 +
   27.36 +void
   27.37 +fsi_bootstring_free(fsi_t *fsi)
   27.38 +{
   27.39 +	if (fsi->f_bootstring != NULL) {
   27.40 +		free(fsi->f_bootstring);
   27.41 +		fsi->f_bootstring = NULL;
   27.42 +	}
   27.43 +}
   27.44 +
   27.45 +char *
   27.46 +fsi_fs_bootstring(fsi_t *fsi)
   27.47 +{
   27.48 +	return (fsi->f_bootstring);
   27.49 +}
    28.1 --- a/tools/libfsimage/common/fsimage.h	Fri Apr 25 20:13:52 2008 +0900
    28.2 +++ b/tools/libfsimage/common/fsimage.h	Thu May 08 18:40:07 2008 +0900
    28.3 @@ -17,7 +17,7 @@
    28.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    28.5   * DEALINGS IN THE SOFTWARE.
    28.6   *
    28.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    28.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    28.9   * Use is subject to license terms.
   28.10   */
   28.11  
   28.12 @@ -45,6 +45,10 @@ int fsi_close_file(fsi_file_t *);
   28.13  ssize_t fsi_read_file(fsi_file_t *, void *, size_t);
   28.14  ssize_t fsi_pread_file(fsi_file_t *, void *, size_t, uint64_t);
   28.15  
   28.16 +char *fsi_bootstring_alloc(fsi_t *, size_t);
   28.17 +void fsi_bootstring_free(fsi_t *);
   28.18 +char *fsi_fs_bootstring(fsi_t *);
   28.19 +
   28.20  #ifdef __cplusplus
   28.21  };
   28.22  #endif
    29.1 --- a/tools/libfsimage/common/fsimage_grub.c	Fri Apr 25 20:13:52 2008 +0900
    29.2 +++ b/tools/libfsimage/common/fsimage_grub.c	Thu May 08 18:40:07 2008 +0900
    29.3 @@ -17,7 +17,7 @@
    29.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    29.5   * DEALINGS IN THE SOFTWARE.
    29.6   *
    29.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    29.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    29.9   * Use is subject to license terms.
   29.10   */
   29.11  
   29.12 @@ -286,6 +286,7 @@ fsig_mount(fsi_t *fsi, const char *path,
   29.13  
   29.14  	if (!ops->fpo_mount(ffi, options)) {
   29.15  		fsip_file_free(ffi);
   29.16 +		fsi_bootstring_free(fsi);
   29.17  		free(fsi->f_data);
   29.18  		fsi->f_data = NULL;
   29.19  		return (-1);
   29.20 @@ -299,6 +300,7 @@ fsig_mount(fsi_t *fsi, const char *path,
   29.21  static int
   29.22  fsig_umount(fsi_t *fsi)
   29.23  {
   29.24 +	fsi_bootstring_free(fsi);
   29.25  	free(fsi->f_data);
   29.26  	return (0);
   29.27  }
    30.1 --- a/tools/libfsimage/common/fsimage_grub.h	Fri Apr 25 20:13:52 2008 +0900
    30.2 +++ b/tools/libfsimage/common/fsimage_grub.h	Thu May 08 18:40:07 2008 +0900
    30.3 @@ -17,7 +17,7 @@
    30.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    30.5   * DEALINGS IN THE SOFTWARE.
    30.6   *
    30.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    30.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    30.9   * Use is subject to license terms.
   30.10   */
   30.11  
   30.12 @@ -72,6 +72,12 @@ unsigned long fsig_log2(unsigned long);
   30.13  #define	ERR_FILELENGTH 1
   30.14  #define	ERR_BAD_FILETYPE 1
   30.15  #define	ERR_FILE_NOT_FOUND 1
   30.16 +#define	ERR_BAD_ARGUMENT 1
   30.17 +#define	ERR_FILESYSTEM_NOT_FOUND 1
   30.18 +#define	ERR_NO_BOOTPATH 1
   30.19 +#define	ERR_DEV_VALUES 1
   30.20 +#define	ERR_WONT_FIT 1
   30.21 +#define	ERR_READ 1
   30.22  
   30.23  fsi_plugin_ops_t *fsig_init(fsi_plugin_t *, fsig_plugin_ops_t *);
   30.24  
    31.1 --- a/tools/libfsimage/common/fsimage_priv.h	Fri Apr 25 20:13:52 2008 +0900
    31.2 +++ b/tools/libfsimage/common/fsimage_priv.h	Thu May 08 18:40:07 2008 +0900
    31.3 @@ -17,7 +17,7 @@
    31.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    31.5   * DEALINGS IN THE SOFTWARE.
    31.6   *
    31.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    31.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    31.9   * Use is subject to license terms.
   31.10   */
   31.11  
   31.12 @@ -46,6 +46,7 @@ struct fsi {
   31.13  	uint64_t f_off;
   31.14  	void *f_data;
   31.15  	fsi_plugin_t *f_plugin;
   31.16 +	char *f_bootstring;
   31.17  };
   31.18  
   31.19  struct fsi_file {
    32.1 --- a/tools/libfsimage/common/mapfile-GNU	Fri Apr 25 20:13:52 2008 +0900
    32.2 +++ b/tools/libfsimage/common/mapfile-GNU	Thu May 08 18:40:07 2008 +0900
    32.3 @@ -8,6 +8,9 @@ VERSION {
    32.4  			fsi_close_file;
    32.5  			fsi_read_file;
    32.6  			fsi_pread_file;
    32.7 +			fsi_bootstring_alloc;
    32.8 +			fsi_bootstring_free;
    32.9 +			fsi_fs_bootstring;
   32.10  	
   32.11  			fsip_fs_set_data;
   32.12  			fsip_file_alloc;
    33.1 --- a/tools/libfsimage/common/mapfile-SunOS	Fri Apr 25 20:13:52 2008 +0900
    33.2 +++ b/tools/libfsimage/common/mapfile-SunOS	Thu May 08 18:40:07 2008 +0900
    33.3 @@ -7,6 +7,9 @@ libfsimage.so.1.0 {
    33.4  		fsi_close_file;
    33.5  		fsi_read_file;
    33.6  		fsi_pread_file;
    33.7 +		fsi_bootstring_alloc;
    33.8 +		fsi_bootstring_free;
    33.9 +		fsi_fs_bootstring;
   33.10  
   33.11  		fsip_fs_set_data;
   33.12  		fsip_file_alloc;
    34.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    34.2 +++ b/tools/libfsimage/zfs/Makefile	Thu May 08 18:40:07 2008 +0900
    34.3 @@ -0,0 +1,37 @@
    34.4 +#
    34.5 +#  GRUB  --  GRand Unified Bootloader
    34.6 +#  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    34.7 +#
    34.8 +#  This program is free software; you can redistribute it and/or modify
    34.9 +#  it under the terms of the GNU General Public License as published by
   34.10 +#  the Free Software Foundation; either version 2 of the License, or
   34.11 +#  (at your option) any later version.
   34.12 +#
   34.13 +#  This program is distributed in the hope that it will be useful,
   34.14 +#  but WITHOUT ANY WARRANTY; without even the implied warranty of
   34.15 +#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   34.16 +#  GNU General Public License for more details.
   34.17 +#
   34.18 +#  You should have received a copy of the GNU General Public License
   34.19 +#  along with this program; if not, write to the Free Software
   34.20 +#  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   34.21 +#
   34.22 +
   34.23 +# 
   34.24 +#  Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
   34.25 +#  Use is subject to license terms.
   34.26 +#
   34.27 +
   34.28 +XEN_ROOT = ../../..
   34.29 +
   34.30 +LIB_SRCS-y = fsys_zfs.c zfs_lzjb.c zfs_sha256.c zfs_fletcher.c
   34.31 +
   34.32 +FS = zfs
   34.33 +
   34.34 +.PHONY: all
   34.35 +all: fs-all
   34.36 +
   34.37 +.PHONY: install
   34.38 +install: fs-install
   34.39 +
   34.40 +include $(XEN_ROOT)/tools/libfsimage/Rules.mk
    35.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    35.2 +++ b/tools/libfsimage/zfs/fsys_zfs.c	Thu May 08 18:40:07 2008 +0900
    35.3 @@ -0,0 +1,1457 @@
    35.4 +/*
    35.5 + *  GRUB  --  GRand Unified Bootloader
    35.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    35.7 + *
    35.8 + *  This program is free software; you can redistribute it and/or modify
    35.9 + *  it under the terms of the GNU General Public License as published by
   35.10 + *  the Free Software Foundation; either version 2 of the License, or
   35.11 + *  (at your option) any later version.
   35.12 + *
   35.13 + *  This program is distributed in the hope that it will be useful,
   35.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   35.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   35.16 + *  GNU General Public License for more details.
   35.17 + *
   35.18 + *  You should have received a copy of the GNU General Public License
   35.19 + *  along with this program; if not, write to the Free Software
   35.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   35.21 + */
   35.22 +/*
   35.23 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
   35.24 + * Use is subject to license terms.
   35.25 + */
   35.26 +
   35.27 +/*
   35.28 + * All files in the zfs directory are derived from the OpenSolaris
   35.29 + * zfs grub files.  All files in the zfs-include directory were
   35.30 + * included without changes.
   35.31 + */
   35.32 +
   35.33 +/*
   35.34 + * The zfs plug-in routines for GRUB are:
   35.35 + *
   35.36 + * zfs_mount() - locates a valid uberblock of the root pool and reads
   35.37 + *		in its MOS at the memory address MOS.
   35.38 + *
   35.39 + * zfs_open() - locates a plain file object by following the MOS
   35.40 + *		and places its dnode at the memory address DNODE.
   35.41 + *
   35.42 + * zfs_read() - read in the data blocks pointed by the DNODE.
   35.43 + *
   35.44 + * ZFS_SCRATCH is used as a working area.
   35.45 + *
   35.46 + * (memory addr)   MOS      DNODE	ZFS_SCRATCH
   35.47 + *		    |         |          |
   35.48 + *	    +-------V---------V----------V---------------+
   35.49 + *   memory |       | dnode   | dnode    |  scratch      |
   35.50 + *	    |       | 512B    | 512B     |  area         |
   35.51 + *	    +--------------------------------------------+
   35.52 + */
   35.53 +
   35.54 +#include <stdio.h>
   35.55 +#include <strings.h>
   35.56 +
   35.57 +/* From "shared.h" */
   35.58 +#include "mb_info.h"
   35.59 +
   35.60 +/* Boot signature related defines for the findroot command */
   35.61 +#define	BOOTSIGN_DIR	"/boot/grub/bootsign"
   35.62 +#define	BOOTSIGN_BACKUP	"/etc/bootsign"
   35.63 +
   35.64 +/* Maybe redirect memory requests through grub_scratch_mem. */
   35.65 +#define	RAW_ADDR(x) (x)
   35.66 +#define	RAW_SEG(x) (x)
   35.67 +
   35.68 +/* ZFS will use the top 4 Meg of physical memory (below 4Gig) for sratch */
   35.69 +#define	ZFS_SCRATCH_SIZE 0x400000
   35.70 +
   35.71 +#define	MIN(x, y) ((x) < (y) ? (x) : (y))
   35.72 +/* End from shared.h */
   35.73 +
   35.74 +#include "fsys_zfs.h"
   35.75 +
   35.76 +/* cache for a file block of the currently zfs_open()-ed file */
   35.77 +#define	file_buf zfs_ba->zfs_file_buf
   35.78 +#define	file_start zfs_ba->zfs_file_start
   35.79 +#define	file_end zfs_ba->zfs_file_end
   35.80 +
   35.81 +/* cache for a dnode block */
   35.82 +#define	dnode_buf zfs_ba->zfs_dnode_buf
   35.83 +#define	dnode_mdn zfs_ba->zfs_dnode_mdn
   35.84 +#define	dnode_start zfs_ba->zfs_dnode_start
   35.85 +#define	dnode_end zfs_ba->zfs_dnode_end
   35.86 +
   35.87 +#define	stackbase zfs_ba->zfs_stackbase
   35.88 +
   35.89 +decomp_entry_t decomp_table[ZIO_COMPRESS_FUNCTIONS] =
   35.90 +{
   35.91 +	{"noop", 0},
   35.92 +	{"on", lzjb_decompress}, 	/* ZIO_COMPRESS_ON */
   35.93 +	{"off", 0},
   35.94 +	{"lzjb", lzjb_decompress}	/* ZIO_COMPRESS_LZJB */
   35.95 +};
   35.96 +
   35.97 +/* From disk_io.c */
   35.98 +/* ZFS root filesystem for booting */
   35.99 +#define	current_bootpath zfs_ba->zfs_current_bootpath
  35.100 +#define	current_rootpool zfs_ba->zfs_current_rootpool
  35.101 +#define	current_bootfs zfs_ba->zfs_current_bootfs
  35.102 +#define	current_bootfs_obj zfs_ba->zfs_current_bootfs_obj
  35.103 +#define	is_zfs_mount (*fsig_int1(ffi))
  35.104 +/* End from disk_io.c */
  35.105 +
  35.106 +#define	is_zfs_open zfs_ba->zfs_open
  35.107 +
  35.108 +/*
  35.109 + * Our own version of bcmp().
  35.110 + */
  35.111 +static int
  35.112 +zfs_bcmp(const void *s1, const void *s2, size_t n)
  35.113 +{
  35.114 +	const unsigned char *ps1 = s1;
  35.115 +	const unsigned char *ps2 = s2;
  35.116 +
  35.117 +	if (s1 != s2 && n != 0) {
  35.118 +		do {
  35.119 +			if (*ps1++ != *ps2++)
  35.120 +				return (1);
  35.121 +		} while (--n != 0);
  35.122 +	}
  35.123 +
  35.124 +	return (0);
  35.125 +}
  35.126 +
  35.127 +/*
  35.128 + * Our own version of log2().  Same thing as highbit()-1.
  35.129 + */
  35.130 +static int
  35.131 +zfs_log2(uint64_t num)
  35.132 +{
  35.133 +	int i = 0;
  35.134 +
  35.135 +	while (num > 1) {
  35.136 +		i++;
  35.137 +		num = num >> 1;
  35.138 +	}
  35.139 +
  35.140 +	return (i);
  35.141 +}
  35.142 +
  35.143 +/* Checksum Functions */
  35.144 +static void
  35.145 +zio_checksum_off(const void *buf, uint64_t size, zio_cksum_t *zcp)
  35.146 +{
  35.147 +	ZIO_SET_CHECKSUM(zcp, 0, 0, 0, 0);
  35.148 +}
  35.149 +
  35.150 +/* Checksum Table and Values */
  35.151 +zio_checksum_info_t zio_checksum_table[ZIO_CHECKSUM_FUNCTIONS] = {
  35.152 +	{{NULL,			NULL},			0, 0,	"inherit"},
  35.153 +	{{NULL,			NULL},			0, 0,	"on"},
  35.154 +	{{zio_checksum_off,	zio_checksum_off},	0, 0,	"off"},
  35.155 +	{{zio_checksum_SHA256,	zio_checksum_SHA256},	1, 1,	"label"},
  35.156 +	{{zio_checksum_SHA256,	zio_checksum_SHA256},	1, 1,	"gang_header"},
  35.157 +	{{fletcher_2_native,	fletcher_2_byteswap},	0, 1,	"zilog"},
  35.158 +	{{fletcher_2_native,	fletcher_2_byteswap},	0, 0,	"fletcher2"},
  35.159 +	{{fletcher_4_native,	fletcher_4_byteswap},	1, 0,	"fletcher4"},
  35.160 +	{{zio_checksum_SHA256,	zio_checksum_SHA256},	1, 0,	"SHA256"}
  35.161 +};
  35.162 +
  35.163 +/*
  35.164 + * zio_checksum_verify: Provides support for checksum verification.
  35.165 + *
  35.166 + * Fletcher2, Fletcher4, and SHA256 are supported.
  35.167 + *
  35.168 + * Return:
  35.169 + * 	-1 = Failure
  35.170 + *	 0 = Success
  35.171 + */
  35.172 +static int
  35.173 +zio_checksum_verify(blkptr_t *bp, char *data, int size)
  35.174 +{
  35.175 +	zio_cksum_t zc = bp->blk_cksum;
  35.176 +	uint32_t checksum = BP_IS_GANG(bp) ? ZIO_CHECKSUM_GANG_HEADER :
  35.177 +	    BP_GET_CHECKSUM(bp);
  35.178 +	int byteswap = BP_SHOULD_BYTESWAP(bp);
  35.179 +	zio_block_tail_t *zbt = (zio_block_tail_t *)(data + size) - 1;
  35.180 +	zio_checksum_info_t *ci = &zio_checksum_table[checksum];
  35.181 +	zio_cksum_t actual_cksum, expected_cksum;
  35.182 +
  35.183 +	/* byteswap is not supported */
  35.184 +	if (byteswap)
  35.185 +		return (-1);
  35.186 +
  35.187 +	if (checksum >= ZIO_CHECKSUM_FUNCTIONS || ci->ci_func[0] == NULL)
  35.188 +		return (-1);
  35.189 +
  35.190 +	if (ci->ci_zbt) {
  35.191 +		if (checksum == ZIO_CHECKSUM_GANG_HEADER) {
  35.192 +			/*
  35.193 +			 * 'gang blocks' is not supported.
  35.194 +			 */
  35.195 +			return (-1);
  35.196 +		}
  35.197 +
  35.198 +		if (zbt->zbt_magic == BSWAP_64(ZBT_MAGIC)) {
  35.199 +			/* byte swapping is not supported */
  35.200 +			return (-1);
  35.201 +		} else {
  35.202 +			expected_cksum = zbt->zbt_cksum;
  35.203 +			zbt->zbt_cksum = zc;
  35.204 +			ci->ci_func[0](data, size, &actual_cksum);
  35.205 +			zbt->zbt_cksum = expected_cksum;
  35.206 +		}
  35.207 +		zc = expected_cksum;
  35.208 +
  35.209 +	} else {
  35.210 +		if (BP_IS_GANG(bp))
  35.211 +			return (-1);
  35.212 +		ci->ci_func[byteswap](data, size, &actual_cksum);
  35.213 +	}
  35.214 +
  35.215 +	if ((actual_cksum.zc_word[0] - zc.zc_word[0]) |
  35.216 +	    (actual_cksum.zc_word[1] - zc.zc_word[1]) |
  35.217 +	    (actual_cksum.zc_word[2] - zc.zc_word[2]) |
  35.218 +	    (actual_cksum.zc_word[3] - zc.zc_word[3]))
  35.219 +		return (-1);
  35.220 +
  35.221 +	return (0);
  35.222 +}
  35.223 +
  35.224 +/*
  35.225 + * vdev_label_offset takes "offset" (the offset within a vdev_label) and
  35.226 + * returns its physical disk offset (starting from the beginning of the vdev).
  35.227 + *
  35.228 + * Input:
  35.229 + *	psize	: Physical size of this vdev
  35.230 + *      l	: Label Number (0-3)
  35.231 + *	offset	: The offset with a vdev_label in which we want the physical
  35.232 + *		  address
  35.233 + * Return:
  35.234 + * 	Success : physical disk offset
  35.235 + * 	Failure : errnum = ERR_BAD_ARGUMENT, return value is meaningless
  35.236 + */
  35.237 +static uint64_t
  35.238 +vdev_label_offset(fsi_file_t *ffi, uint64_t psize, int l, uint64_t offset)
  35.239 +{
  35.240 +	/* XXX Need to add back label support! */
  35.241 +	if (l >= VDEV_LABELS/2 || offset > sizeof (vdev_label_t)) {
  35.242 +		errnum = ERR_BAD_ARGUMENT;
  35.243 +		return (0);
  35.244 +	}
  35.245 +
  35.246 +	return (offset + l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ?
  35.247 +	    0 : psize - VDEV_LABELS * sizeof (vdev_label_t)));
  35.248 +
  35.249 +}
  35.250 +
  35.251 +/*
  35.252 + * vdev_uberblock_compare takes two uberblock structures and returns an integer
  35.253 + * indicating the more recent of the two.
  35.254 + * 	Return Value = 1 if ub2 is more recent
  35.255 + * 	Return Value = -1 if ub1 is more recent
  35.256 + * The most recent uberblock is determined using its transaction number and
  35.257 + * timestamp.  The uberblock with the highest transaction number is
  35.258 + * considered "newer".  If the transaction numbers of the two blocks match, the
  35.259 + * timestamps are compared to determine the "newer" of the two.
  35.260 + */
  35.261 +static int
  35.262 +vdev_uberblock_compare(uberblock_t *ub1, uberblock_t *ub2)
  35.263 +{
  35.264 +	if (ub1->ub_txg < ub2->ub_txg)
  35.265 +		return (-1);
  35.266 +	if (ub1->ub_txg > ub2->ub_txg)
  35.267 +		return (1);
  35.268 +
  35.269 +	if (ub1->ub_timestamp < ub2->ub_timestamp)
  35.270 +		return (-1);
  35.271 +	if (ub1->ub_timestamp > ub2->ub_timestamp)
  35.272 +		return (1);
  35.273 +
  35.274 +	return (0);
  35.275 +}
  35.276 +
  35.277 +/*
  35.278 + * Three pieces of information are needed to verify an uberblock: the magic
  35.279 + * number, the version number, and the checksum.
  35.280 + *
  35.281 + * Currently Implemented: version number, magic number
  35.282 + * Need to Implement: checksum
  35.283 + *
  35.284 + * Return:
  35.285 + *     0 - Success
  35.286 + *    -1 - Failure
  35.287 + */
  35.288 +static int
  35.289 +uberblock_verify(uberblock_phys_t *ub, int offset)
  35.290 +{
  35.291 +
  35.292 +	uberblock_t *uber = &ub->ubp_uberblock;
  35.293 +	blkptr_t bp;
  35.294 +
  35.295 +	BP_ZERO(&bp);
  35.296 +	BP_SET_CHECKSUM(&bp, ZIO_CHECKSUM_LABEL);
  35.297 +	BP_SET_BYTEORDER(&bp, ZFS_HOST_BYTEORDER);
  35.298 +	ZIO_SET_CHECKSUM(&bp.blk_cksum, offset, 0, 0, 0);
  35.299 +
  35.300 +	if (zio_checksum_verify(&bp, (char *)ub, UBERBLOCK_SIZE) != 0)
  35.301 +		return (-1);
  35.302 +
  35.303 +	if (uber->ub_magic == UBERBLOCK_MAGIC &&
  35.304 +	    uber->ub_version >= SPA_VERSION_1 &&
  35.305 +	    uber->ub_version <= SPA_VERSION)
  35.306 +		return (0);
  35.307 +
  35.308 +	return (-1);
  35.309 +}
  35.310 +
  35.311 +/*
  35.312 + * Find the best uberblock.
  35.313 + * Return:
  35.314 + *    Success - Pointer to the best uberblock.
  35.315 + *    Failure - NULL
  35.316 + */
  35.317 +static uberblock_phys_t *
  35.318 +find_bestub(fsi_file_t *ffi, uberblock_phys_t *ub_array, int label)
  35.319 +{
  35.320 +	uberblock_phys_t *ubbest = NULL;
  35.321 +	int i, offset;
  35.322 +
  35.323 +	for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++) {
  35.324 +		offset = vdev_label_offset(ffi, 0, label,
  35.325 +		    VDEV_UBERBLOCK_OFFSET(i));
  35.326 +		if (errnum == ERR_BAD_ARGUMENT)
  35.327 +			return (NULL);
  35.328 +		if (uberblock_verify(&ub_array[i], offset) == 0) {
  35.329 +			if (ubbest == NULL) {
  35.330 +				ubbest = &ub_array[i];
  35.331 +			} else if (vdev_uberblock_compare(
  35.332 +			    &(ub_array[i].ubp_uberblock),
  35.333 +			    &(ubbest->ubp_uberblock)) > 0) {
  35.334 +				ubbest = &ub_array[i];
  35.335 +			}
  35.336 +		}
  35.337 +	}
  35.338 +
  35.339 +	return (ubbest);
  35.340 +}
  35.341 +
  35.342 +/*
  35.343 + * Read in a block and put its uncompressed data in buf.
  35.344 + *
  35.345 + * Return:
  35.346 + *	0 - success
  35.347 + *	errnum - failure
  35.348 + */
  35.349 +static int
  35.350 +zio_read(fsi_file_t *ffi, blkptr_t *bp, void *buf, char *stack)
  35.351 +{
  35.352 +	uint64_t offset, sector;
  35.353 +	int psize, lsize;
  35.354 +	int i, comp, cksum;
  35.355 +
  35.356 +	psize = BP_GET_PSIZE(bp);
  35.357 +	lsize = BP_GET_LSIZE(bp);
  35.358 +	comp = BP_GET_COMPRESS(bp);
  35.359 +	cksum = BP_GET_CHECKSUM(bp);
  35.360 +
  35.361 +	if ((unsigned int)comp >= ZIO_COMPRESS_FUNCTIONS ||
  35.362 +	    (comp != ZIO_COMPRESS_OFF &&
  35.363 +	    decomp_table[comp].decomp_func == NULL))
  35.364 +		return (ERR_FSYS_CORRUPT);
  35.365 +
  35.366 +	/* pick a good dva from the block pointer */
  35.367 +	for (i = 0; i < SPA_DVAS_PER_BP; i++) {
  35.368 +
  35.369 +		if (bp->blk_dva[i].dva_word[0] == 0 &&
  35.370 +		    bp->blk_dva[i].dva_word[1] == 0)
  35.371 +			continue;
  35.372 +
  35.373 +		/* read in a block */
  35.374 +		offset = DVA_GET_OFFSET(&bp->blk_dva[i]);
  35.375 +		sector =  DVA_OFFSET_TO_PHYS_SECTOR(offset);
  35.376 +
  35.377 +		if (comp != ZIO_COMPRESS_OFF) {
  35.378 +
  35.379 +			if (devread(ffi, sector, 0, psize, stack) == 0)
  35.380 +				continue;
  35.381 +			if (zio_checksum_verify(bp, stack, psize) != 0)
  35.382 +				continue;
  35.383 +			decomp_table[comp].decomp_func(stack, buf, psize,
  35.384 +			    lsize);
  35.385 +		} else {
  35.386 +			if (devread(ffi, sector, 0, psize, buf) == 0)
  35.387 +				continue;
  35.388 +			if (zio_checksum_verify(bp, buf, psize) != 0)
  35.389 +				continue;
  35.390 +		}
  35.391 +		return (0);
  35.392 +	}
  35.393 +
  35.394 +	return (ERR_FSYS_CORRUPT);
  35.395 +}
  35.396 +
  35.397 +/*
  35.398 + * Get the block from a block id.
  35.399 + * push the block onto the stack.
  35.400 + *
  35.401 + * Return:
  35.402 + * 	0 - success
  35.403 + * 	errnum - failure
  35.404 + */
  35.405 +static int
  35.406 +dmu_read(fsi_file_t *ffi, dnode_phys_t *dn, uint64_t blkid, void *buf,
  35.407 +    char *stack)
  35.408 +{
  35.409 +	int idx, level;
  35.410 +	blkptr_t *bp_array = dn->dn_blkptr;
  35.411 +	int epbs = dn->dn_indblkshift - SPA_BLKPTRSHIFT;
  35.412 +	blkptr_t *bp, *tmpbuf;
  35.413 +
  35.414 +	bp = (blkptr_t *)stack;
  35.415 +	stack += sizeof (blkptr_t);
  35.416 +
  35.417 +	tmpbuf = (blkptr_t *)stack;
  35.418 +	stack += 1<<dn->dn_indblkshift;
  35.419 +
  35.420 +	for (level = dn->dn_nlevels - 1; level >= 0; level--) {
  35.421 +		idx = (blkid >> (epbs * level)) & ((1<<epbs)-1);
  35.422 +		*bp = bp_array[idx];
  35.423 +		if (level == 0)
  35.424 +			tmpbuf = buf;
  35.425 +		if (BP_IS_HOLE(bp)) {
  35.426 +			grub_memset(buf, 0,
  35.427 +			    dn->dn_datablkszsec << SPA_MINBLOCKSHIFT);
  35.428 +			break;
  35.429 +		} else if ((errnum = zio_read(ffi, bp, tmpbuf, stack))) {
  35.430 +			return (errnum);
  35.431 +		}
  35.432 +		bp_array = tmpbuf;
  35.433 +	}
  35.434 +
  35.435 +	return (0);
  35.436 +}
  35.437 +
  35.438 +/*
  35.439 + * mzap_lookup: Looks up property described by "name" and returns the value
  35.440 + * in "value".
  35.441 + *
  35.442 + * Return:
  35.443 + *	0 - success
  35.444 + *	errnum - failure
  35.445 + */
  35.446 +static int
  35.447 +mzap_lookup(mzap_phys_t *zapobj, int objsize, char *name,
  35.448 +	uint64_t *value)
  35.449 +{
  35.450 +	int i, chunks;
  35.451 +	mzap_ent_phys_t *mzap_ent = zapobj->mz_chunk;
  35.452 +
  35.453 +	chunks = objsize/MZAP_ENT_LEN - 1;
  35.454 +	for (i = 0; i < chunks; i++) {
  35.455 +		if (strcmp(mzap_ent[i].mze_name, name) == 0) {
  35.456 +			*value = mzap_ent[i].mze_value;
  35.457 +			return (0);
  35.458 +		}
  35.459 +	}
  35.460 +
  35.461 +	return (ERR_FSYS_CORRUPT);
  35.462 +}
  35.463 +
  35.464 +static uint64_t
  35.465 +zap_hash(fsi_file_t *ffi, uint64_t salt, const char *name)
  35.466 +{
  35.467 +	static uint64_t table[256];
  35.468 +	const uint8_t *cp;
  35.469 +	uint8_t c;
  35.470 +	uint64_t crc = salt;
  35.471 +
  35.472 +	if (table[128] == 0) {
  35.473 +		uint64_t *ct;
  35.474 +		int i, j;
  35.475 +		for (i = 0; i < 256; i++) {
  35.476 +			for (ct = table + i, *ct = i, j = 8; j > 0; j--)
  35.477 +				*ct = (*ct >> 1) ^ (-(*ct & 1) &
  35.478 +				    ZFS_CRC64_POLY);
  35.479 +		}
  35.480 +	}
  35.481 +
  35.482 +	if (crc == 0 || table[128] != ZFS_CRC64_POLY) {
  35.483 +		errnum = ERR_FSYS_CORRUPT;
  35.484 +		return (0);
  35.485 +	}
  35.486 +
  35.487 +	for (cp = (const uint8_t *)name; (c = *cp) != '\0'; cp++)
  35.488 +		crc = (crc >> 8) ^ table[(crc ^ c) & 0xFF];
  35.489 +
  35.490 +	/*
  35.491 +	 * Only use 28 bits, since we need 4 bits in the cookie for the
  35.492 +	 * collision differentiator.  We MUST use the high bits, since
  35.493 +	 * those are the onces that we first pay attention to when
  35.494 +	 * chosing the bucket.
  35.495 +	 */
  35.496 +	crc &= ~((1ULL << (64 - ZAP_HASHBITS)) - 1);
  35.497 +
  35.498 +	return (crc);
  35.499 +}
  35.500 +
  35.501 +/*
  35.502 + * Only to be used on 8-bit arrays.
  35.503 + * array_len is actual len in bytes (not encoded le_value_length).
  35.504 + * buf is null-terminated.
  35.505 + */
  35.506 +static int
  35.507 +zap_leaf_array_equal(zap_leaf_phys_t *l, int blksft, int chunk,
  35.508 +    int array_len, const char *buf)
  35.509 +{
  35.510 +	int bseen = 0;
  35.511 +
  35.512 +	while (bseen < array_len) {
  35.513 +		struct zap_leaf_array *la =
  35.514 +		    &ZAP_LEAF_CHUNK(l, blksft, chunk).l_array;
  35.515 +		int toread = MIN(array_len - bseen, ZAP_LEAF_ARRAY_BYTES);
  35.516 +
  35.517 +		if (chunk >= ZAP_LEAF_NUMCHUNKS(blksft))
  35.518 +			return (0);
  35.519 +
  35.520 +		if (zfs_bcmp(la->la_array, buf + bseen, toread) != 0)
  35.521 +			break;
  35.522 +		chunk = la->la_next;
  35.523 +		bseen += toread;
  35.524 +	}
  35.525 +	return (bseen == array_len);
  35.526 +}
  35.527 +
  35.528 +/*
  35.529 + * Given a zap_leaf_phys_t, walk thru the zap leaf chunks to get the
  35.530 + * value for the property "name".
  35.531 + *
  35.532 + * Return:
  35.533 + *	0 - success
  35.534 + *	errnum - failure
  35.535 + */
  35.536 +static int
  35.537 +zap_leaf_lookup(zap_leaf_phys_t *l, int blksft, uint64_t h,
  35.538 +    const char *name, uint64_t *value)
  35.539 +{
  35.540 +	uint16_t chunk;
  35.541 +	struct zap_leaf_entry *le;
  35.542 +
  35.543 +	/* Verify if this is a valid leaf block */
  35.544 +	if (l->l_hdr.lh_block_type != ZBT_LEAF)
  35.545 +		return (ERR_FSYS_CORRUPT);
  35.546 +	if (l->l_hdr.lh_magic != ZAP_LEAF_MAGIC)
  35.547 +		return (ERR_FSYS_CORRUPT);
  35.548 +
  35.549 +	for (chunk = l->l_hash[LEAF_HASH(blksft, h)];
  35.550 +	    chunk != CHAIN_END; chunk = le->le_next) {
  35.551 +
  35.552 +		if (chunk >= ZAP_LEAF_NUMCHUNKS(blksft))
  35.553 +			return (ERR_FSYS_CORRUPT);
  35.554 +
  35.555 +		le = ZAP_LEAF_ENTRY(l, blksft, chunk);
  35.556 +
  35.557 +		/* Verify the chunk entry */
  35.558 +		if (le->le_type != ZAP_CHUNK_ENTRY)
  35.559 +			return (ERR_FSYS_CORRUPT);
  35.560 +
  35.561 +		if (le->le_hash != h)
  35.562 +			continue;
  35.563 +
  35.564 +		if (zap_leaf_array_equal(l, blksft, le->le_name_chunk,
  35.565 +		    le->le_name_length, name)) {
  35.566 +
  35.567 +			struct zap_leaf_array *la;
  35.568 +			uint8_t *ip;
  35.569 +
  35.570 +			if (le->le_int_size != 8 || le->le_value_length != 1)
  35.571 +				return (ERR_FSYS_CORRUPT);
  35.572 +
  35.573 +			/* get the uint64_t property value */
  35.574 +			la = &ZAP_LEAF_CHUNK(l, blksft,
  35.575 +			    le->le_value_chunk).l_array;
  35.576 +			ip = la->la_array;
  35.577 +
  35.578 +			*value = (uint64_t)ip[0] << 56 | (uint64_t)ip[1] << 48 |
  35.579 +			    (uint64_t)ip[2] << 40 | (uint64_t)ip[3] << 32 |
  35.580 +			    (uint64_t)ip[4] << 24 | (uint64_t)ip[5] << 16 |
  35.581 +			    (uint64_t)ip[6] << 8 | (uint64_t)ip[7];
  35.582 +
  35.583 +			return (0);
  35.584 +		}
  35.585 +	}
  35.586 +
  35.587 +	return (ERR_FSYS_CORRUPT);
  35.588 +}
  35.589 +
  35.590 +/*
  35.591 + * Fat ZAP lookup
  35.592 + *
  35.593 + * Return:
  35.594 + *	0 - success
  35.595 + *	errnum - failure
  35.596 + */
  35.597 +static int
  35.598 +fzap_lookup(fsi_file_t *ffi, dnode_phys_t *zap_dnode, zap_phys_t *zap,
  35.599 +    char *name, uint64_t *value, char *stack)
  35.600 +{
  35.601 +	zap_leaf_phys_t *l;
  35.602 +	uint64_t hash, idx, blkid;
  35.603 +	int blksft = zfs_log2(zap_dnode->dn_datablkszsec << DNODE_SHIFT);
  35.604 +
  35.605 +	/* Verify if this is a fat zap header block */
  35.606 +	if (zap->zap_magic != (uint64_t)ZAP_MAGIC)
  35.607 +		return (ERR_FSYS_CORRUPT);
  35.608 +
  35.609 +	hash = zap_hash(ffi, zap->zap_salt, name);
  35.610 +	if (errnum)
  35.611 +		return (errnum);
  35.612 +
  35.613 +	/* get block id from index */
  35.614 +	if (zap->zap_ptrtbl.zt_numblks != 0) {
  35.615 +		/* external pointer tables not supported */
  35.616 +		return (ERR_FSYS_CORRUPT);
  35.617 +	}
  35.618 +	idx = ZAP_HASH_IDX(hash, zap->zap_ptrtbl.zt_shift);
  35.619 +	blkid = ((uint64_t *)zap)[idx + (1<<(blksft-3-1))];
  35.620 +
  35.621 +	/* Get the leaf block */
  35.622 +	l = (zap_leaf_phys_t *)stack;
  35.623 +	stack += 1<<blksft;
  35.624 +	if ((errnum = dmu_read(ffi, zap_dnode, blkid, l, stack)))
  35.625 +		return (errnum);
  35.626 +
  35.627 +	return (zap_leaf_lookup(l, blksft, hash, name, value));
  35.628 +}
  35.629 +
  35.630 +/*
  35.631 + * Read in the data of a zap object and find the value for a matching
  35.632 + * property name.
  35.633 + *
  35.634 + * Return:
  35.635 + *	0 - success
  35.636 + *	errnum - failure
  35.637 + */
  35.638 +static int
  35.639 +zap_lookup(fsi_file_t *ffi, dnode_phys_t *zap_dnode, char *name,
  35.640 +    uint64_t *val, char *stack)
  35.641 +{
  35.642 +	uint64_t block_type;
  35.643 +	int size;
  35.644 +	void *zapbuf;
  35.645 +
  35.646 +	/* Read in the first block of the zap object data. */
  35.647 +	zapbuf = stack;
  35.648 +	size = zap_dnode->dn_datablkszsec << SPA_MINBLOCKSHIFT;
  35.649 +	stack += size;
  35.650 +	if ((errnum = dmu_read(ffi, zap_dnode, 0, zapbuf, stack)))
  35.651 +		return (errnum);
  35.652 +
  35.653 +	block_type = *((uint64_t *)zapbuf);
  35.654 +
  35.655 +	if (block_type == ZBT_MICRO) {
  35.656 +		return (mzap_lookup(zapbuf, size, name, val));
  35.657 +	} else if (block_type == ZBT_HEADER) {
  35.658 +		/* this is a fat zap */
  35.659 +		return (fzap_lookup(ffi, zap_dnode, zapbuf, name,
  35.660 +		    val, stack));
  35.661 +	}
  35.662 +
  35.663 +	return (ERR_FSYS_CORRUPT);
  35.664 +}
  35.665 +
  35.666 +/*
  35.667 + * Get the dnode of an object number from the metadnode of an object set.
  35.668 + *
  35.669 + * Input
  35.670 + *	mdn - metadnode to get the object dnode
  35.671 + *	objnum - object number for the object dnode
  35.672 + *	buf - data buffer that holds the returning dnode
  35.673 + *	stack - scratch area
  35.674 + *
  35.675 + * Return:
  35.676 + *	0 - success
  35.677 + *	errnum - failure
  35.678 + */
  35.679 +static int
  35.680 +dnode_get(fsi_file_t *ffi, dnode_phys_t *mdn, uint64_t objnum,
  35.681 +    uint8_t type, dnode_phys_t *buf, char *stack)
  35.682 +{
  35.683 +	uint64_t blkid, blksz; /* the block id this object dnode is in */
  35.684 +	int epbs; /* shift of number of dnodes in a block */
  35.685 +	int idx; /* index within a block */
  35.686 +	dnode_phys_t *dnbuf;
  35.687 +	zfs_bootarea_t *zfs_ba = (zfs_bootarea_t *)ffi->ff_fsi->f_data;
  35.688 +
  35.689 +	blksz = mdn->dn_datablkszsec << SPA_MINBLOCKSHIFT;
  35.690 +	epbs = zfs_log2(blksz) - DNODE_SHIFT;
  35.691 +	blkid = objnum >> epbs;
  35.692 +	idx = objnum & ((1<<epbs)-1);
  35.693 +
  35.694 +	if (dnode_buf != NULL && dnode_mdn == mdn &&
  35.695 +	    objnum >= dnode_start && objnum < dnode_end) {
  35.696 +		grub_memmove(buf, &dnode_buf[idx], DNODE_SIZE);
  35.697 +		VERIFY_DN_TYPE(buf, type);
  35.698 +		return (0);
  35.699 +	}
  35.700 +
  35.701 +	if (dnode_buf && blksz == 1<<DNODE_BLOCK_SHIFT) {
  35.702 +		dnbuf = dnode_buf;
  35.703 +		dnode_mdn = mdn;
  35.704 +		dnode_start = blkid << epbs;
  35.705 +		dnode_end = (blkid + 1) << epbs;
  35.706 +	} else {
  35.707 +		dnbuf = (dnode_phys_t *)stack;
  35.708 +		stack += blksz;
  35.709 +	}
  35.710 +
  35.711 +	if ((errnum = dmu_read(ffi, mdn, blkid, (char *)dnbuf, stack)))
  35.712 +		return (errnum);
  35.713 +
  35.714 +	grub_memmove(buf, &dnbuf[idx], DNODE_SIZE);
  35.715 +	VERIFY_DN_TYPE(buf, type);
  35.716 +
  35.717 +	return (0);
  35.718 +}
  35.719 +
  35.720 +/*
  35.721 + * Check if this is a special file that resides at the top
  35.722 + * dataset of the pool. Currently this is the GRUB menu,
  35.723 + * boot signature and boot signature backup.
  35.724 + * str starts with '/'.
  35.725 + */
  35.726 +static int
  35.727 +is_top_dataset_file(char *str)
  35.728 +{
  35.729 +	char *tptr;
  35.730 +
  35.731 +	if (((tptr = strstr(str, "menu.lst"))) &&
  35.732 +	    (tptr[8] == '\0' || tptr[8] == ' ') &&
  35.733 +	    *(tptr-1) == '/')
  35.734 +		return (1);
  35.735 +
  35.736 +	if (strncmp(str, BOOTSIGN_DIR"/",
  35.737 +	    strlen(BOOTSIGN_DIR) + 1) == 0)
  35.738 +		return (1);
  35.739 +
  35.740 +	if (strcmp(str, BOOTSIGN_BACKUP) == 0)
  35.741 +		return (1);
  35.742 +
  35.743 +	return (0);
  35.744 +}
  35.745 +
  35.746 +/*
  35.747 + * Get the file dnode for a given file name where mdn is the meta dnode
  35.748 + * for this ZFS object set. When found, place the file dnode in dn.
  35.749 + * The 'path' argument will be mangled.
  35.750 + *
  35.751 + * Return:
  35.752 + *	0 - success
  35.753 + *	errnum - failure
  35.754 + */
  35.755 +static int
  35.756 +dnode_get_path(fsi_file_t *ffi, dnode_phys_t *mdn, char *path,
  35.757 +    dnode_phys_t *dn, char *stack)
  35.758 +{
  35.759 +	uint64_t objnum, version;
  35.760 +	char *cname, ch;
  35.761 +
  35.762 +	if ((errnum = dnode_get(ffi, mdn, MASTER_NODE_OBJ, DMU_OT_MASTER_NODE,
  35.763 +	    dn, stack)))
  35.764 +		return (errnum);
  35.765 +
  35.766 +	if ((errnum = zap_lookup(ffi, dn, ZPL_VERSION_STR, &version, stack)))
  35.767 +		return (errnum);
  35.768 +	if (version > ZPL_VERSION)
  35.769 +		return (-1);
  35.770 +
  35.771 +	if ((errnum = zap_lookup(ffi, dn, ZFS_ROOT_OBJ, &objnum, stack)))
  35.772 +		return (errnum);
  35.773 +
  35.774 +	if ((errnum = dnode_get(ffi, mdn, objnum, DMU_OT_DIRECTORY_CONTENTS,
  35.775 +	    dn, stack)))
  35.776 +		return (errnum);
  35.777 +
  35.778 +	/* skip leading slashes */
  35.779 +	while (*path == '/')
  35.780 +		path++;
  35.781 +
  35.782 +	while (*path && !isspace(*path)) {
  35.783 +
  35.784 +		/* get the next component name */
  35.785 +		cname = path;
  35.786 +		while (*path && !isspace(*path) && *path != '/')
  35.787 +			path++;
  35.788 +		ch = *path;
  35.789 +		*path = 0;   /* ensure null termination */
  35.790 +
  35.791 +		if ((errnum = zap_lookup(ffi, dn, cname, &objnum, stack)))
  35.792 +			return (errnum);
  35.793 +
  35.794 +		objnum = ZFS_DIRENT_OBJ(objnum);
  35.795 +		if ((errnum = dnode_get(ffi, mdn, objnum, 0, dn, stack)))
  35.796 +			return (errnum);
  35.797 +
  35.798 +		*path = ch;
  35.799 +		while (*path == '/')
  35.800 +			path++;
  35.801 +	}
  35.802 +
  35.803 +	/* We found the dnode for this file. Verify if it is a plain file. */
  35.804 +	VERIFY_DN_TYPE(dn, DMU_OT_PLAIN_FILE_CONTENTS);
  35.805 +
  35.806 +	return (0);
  35.807 +}
  35.808 +
  35.809 +/*
  35.810 + * Get the default 'bootfs' property value from the rootpool.
  35.811 + *
  35.812 + * Return:
  35.813 + *	0 - success
  35.814 + *	errnum -failure
  35.815 + */
  35.816 +static int
  35.817 +get_default_bootfsobj(fsi_file_t *ffi, dnode_phys_t *mosmdn,
  35.818 +    uint64_t *obj, char *stack)
  35.819 +{
  35.820 +	uint64_t objnum = 0;
  35.821 +	dnode_phys_t *dn = (dnode_phys_t *)stack;
  35.822 +	stack += DNODE_SIZE;
  35.823 +
  35.824 +	if ((errnum = dnode_get(ffi, mosmdn, DMU_POOL_DIRECTORY_OBJECT,
  35.825 +	    DMU_OT_OBJECT_DIRECTORY, dn, stack)))
  35.826 +		return (errnum);
  35.827 +
  35.828 +	/*
  35.829 +	 * find the object number for 'pool_props', and get the dnode
  35.830 +	 * of the 'pool_props'.
  35.831 +	 */
  35.832 +	if (zap_lookup(ffi, dn, DMU_POOL_PROPS, &objnum, stack))
  35.833 +		return (ERR_FILESYSTEM_NOT_FOUND);
  35.834 +
  35.835 +	if ((errnum = dnode_get(ffi, mosmdn, objnum, DMU_OT_POOL_PROPS, dn,
  35.836 +	    stack)))
  35.837 +		return (errnum);
  35.838 +
  35.839 +	if (zap_lookup(ffi, dn, ZPOOL_PROP_BOOTFS, &objnum, stack))
  35.840 +		return (ERR_FILESYSTEM_NOT_FOUND);
  35.841 +
  35.842 +	if (!objnum)
  35.843 +		return (ERR_FILESYSTEM_NOT_FOUND);
  35.844 +
  35.845 +
  35.846 +	*obj = objnum;
  35.847 +	return (0);
  35.848 +}
  35.849 +
  35.850 +/*
  35.851 + * Given a MOS metadnode, get the metadnode of a given filesystem name (fsname),
  35.852 + * e.g. pool/rootfs, or a given object number (obj), e.g. the object number
  35.853 + * of pool/rootfs.
  35.854 + *
  35.855 + * If no fsname and no obj are given, return the DSL_DIR metadnode.
  35.856 + * If fsname is given, return its metadnode and its matching object number.
  35.857 + * If only obj is given, return the metadnode for this object number.
  35.858 + *
  35.859 + * Return:
  35.860 + *	0 - success
  35.861 + *	errnum - failure
  35.862 + */
  35.863 +static int
  35.864 +get_objset_mdn(fsi_file_t *ffi, dnode_phys_t *mosmdn, char *fsname,
  35.865 +    uint64_t *obj, dnode_phys_t *mdn, char *stack)
  35.866 +{
  35.867 +	uint64_t objnum, headobj;
  35.868 +	char *cname, ch;
  35.869 +	blkptr_t *bp;
  35.870 +	objset_phys_t *osp;
  35.871 +
  35.872 +	if (fsname == NULL && obj) {
  35.873 +		headobj = *obj;
  35.874 +		goto skip;
  35.875 +	}
  35.876 +
  35.877 +	if ((errnum = dnode_get(ffi, mosmdn, DMU_POOL_DIRECTORY_OBJECT,
  35.878 +	    DMU_OT_OBJECT_DIRECTORY, mdn, stack)))
  35.879 +		return (errnum);
  35.880 +
  35.881 +	if ((errnum = zap_lookup(ffi, mdn, DMU_POOL_ROOT_DATASET, &objnum,
  35.882 +	    stack)))
  35.883 +		return (errnum);
  35.884 +
  35.885 +	if ((errnum = dnode_get(ffi, mosmdn, objnum, DMU_OT_DSL_DIR, mdn,
  35.886 +	    stack)))
  35.887 +		return (errnum);
  35.888 +
  35.889 +	if (fsname == NULL) {
  35.890 +		headobj =
  35.891 +		    ((dsl_dir_phys_t *)DN_BONUS(mdn))->dd_head_dataset_obj;
  35.892 +		goto skip;
  35.893 +	}
  35.894 +
  35.895 +	/* take out the pool name */
  35.896 +	while (*fsname && !isspace(*fsname) && *fsname != '/')
  35.897 +		fsname++;
  35.898 +
  35.899 +	while (*fsname && !isspace(*fsname)) {
  35.900 +		uint64_t childobj;
  35.901 +
  35.902 +		while (*fsname == '/')
  35.903 +			fsname++;
  35.904 +
  35.905 +		cname = fsname;
  35.906 +		while (*fsname && !isspace(*fsname) && *fsname != '/')
  35.907 +			fsname++;
  35.908 +		ch = *fsname;
  35.909 +		*fsname = 0;
  35.910 +
  35.911 +		childobj =
  35.912 +		    ((dsl_dir_phys_t *)DN_BONUS(mdn))->dd_child_dir_zapobj;
  35.913 +		if ((errnum = dnode_get(ffi, mosmdn, childobj,
  35.914 +		    DMU_OT_DSL_DIR_CHILD_MAP, mdn, stack)))
  35.915 +			return (errnum);
  35.916 +
  35.917 +		if (zap_lookup(ffi, mdn, cname, &objnum, stack))
  35.918 +			return (ERR_FILESYSTEM_NOT_FOUND);
  35.919 +
  35.920 +		if ((errnum = dnode_get(ffi, mosmdn, objnum, DMU_OT_DSL_DIR,
  35.921 +		    mdn, stack)))
  35.922 +			return (errnum);
  35.923 +
  35.924 +		*fsname = ch;
  35.925 +	}
  35.926 +	headobj = ((dsl_dir_phys_t *)DN_BONUS(mdn))->dd_head_dataset_obj;
  35.927 +	if (obj)
  35.928 +		*obj = headobj;
  35.929 +
  35.930 +skip:
  35.931 +	if ((errnum = dnode_get(ffi, mosmdn, headobj, DMU_OT_DSL_DATASET, mdn,
  35.932 +	    stack)))
  35.933 +		return (errnum);
  35.934 +
  35.935 +	/* TODO: Add snapshot support here - for fsname=snapshot-name */
  35.936 +
  35.937 +	bp = &((dsl_dataset_phys_t *)DN_BONUS(mdn))->ds_bp;
  35.938 +	osp = (objset_phys_t *)stack;
  35.939 +	stack += sizeof (objset_phys_t);
  35.940 +	if ((errnum = zio_read(ffi, bp, osp, stack)))
  35.941 +		return (errnum);
  35.942 +
  35.943 +	grub_memmove((char *)mdn, (char *)&osp->os_meta_dnode, DNODE_SIZE);
  35.944 +
  35.945 +	return (0);
  35.946 +}
  35.947 +
  35.948 +/*
  35.949 + * For a given XDR packed nvlist, verify the first 4 bytes and move on.
  35.950 + *
  35.951 + * An XDR packed nvlist is encoded as (comments from nvs_xdr_create) :
  35.952 + *
  35.953 + *      encoding method/host endian     (4 bytes)
  35.954 + *      nvl_version                     (4 bytes)
  35.955 + *      nvl_nvflag                      (4 bytes)
  35.956 + *	encoded nvpairs:
  35.957 + *		encoded size of the nvpair      (4 bytes)
  35.958 + *		decoded size of the nvpair      (4 bytes)
  35.959 + *		name string size                (4 bytes)
  35.960 + *		name string data                (sizeof(NV_ALIGN4(string))
  35.961 + *		data type                       (4 bytes)
  35.962 + *		# of elements in the nvpair     (4 bytes)
  35.963 + *		data
  35.964 + *      2 zero's for the last nvpair
  35.965 + *		(end of the entire list)	(8 bytes)
  35.966 + *
  35.967 + * Return:
  35.968 + *	0 - success
  35.969 + *	1 - failure
  35.970 + */
  35.971 +static int
  35.972 +nvlist_unpack(char *nvlist, char **out)
  35.973 +{
  35.974 +	/* Verify if the 1st and 2nd byte in the nvlist are valid. */
  35.975 +	if (nvlist[0] != NV_ENCODE_XDR || nvlist[1] != HOST_ENDIAN)
  35.976 +		return (1);
  35.977 +
  35.978 +	nvlist += 4;
  35.979 +	*out = nvlist;
  35.980 +	return (0);
  35.981 +}
  35.982 +
  35.983 +static char *
  35.984 +nvlist_array(char *nvlist, int index)
  35.985 +{
  35.986 +	int i, encode_size;
  35.987 +
  35.988 +	for (i = 0; i < index; i++) {
  35.989 +		/* skip the header, nvl_version, and nvl_nvflag */
  35.990 +		nvlist = nvlist + 4 * 2;
  35.991 +
  35.992 +		while ((encode_size = BSWAP_32(*(uint32_t *)nvlist)))
  35.993 +			nvlist += encode_size; /* goto the next nvpair */
  35.994 +
  35.995 +		nvlist = nvlist + 4 * 2; /* skip the ending 2 zeros - 8 bytes */
  35.996 +	}
  35.997 +
  35.998 +	return (nvlist);
  35.999 +}
 35.1000 +
 35.1001 +static int
 35.1002 +nvlist_lookup_value(char *nvlist, char *name, void *val, int valtype,
 35.1003 +    int *nelmp)
 35.1004 +{
 35.1005 +	int name_len, type, slen, encode_size;
 35.1006 +	char *nvpair, *nvp_name, *strval = val;
 35.1007 +	uint64_t *intval = val;
 35.1008 +
 35.1009 +	/* skip the header, nvl_version, and nvl_nvflag */
 35.1010 +	nvlist = nvlist + 4 * 2;
 35.1011 +
 35.1012 +	/*
 35.1013 +	 * Loop thru the nvpair list
 35.1014 +	 * The XDR representation of an integer is in big-endian byte order.
 35.1015 +	 */
 35.1016 +	while ((encode_size = BSWAP_32(*(uint32_t *)nvlist)))  {
 35.1017 +
 35.1018 +		nvpair = nvlist + 4 * 2; /* skip the encode/decode size */
 35.1019 +
 35.1020 +		name_len = BSWAP_32(*(uint32_t *)nvpair);
 35.1021 +		nvpair += 4;
 35.1022 +
 35.1023 +		nvp_name = nvpair;
 35.1024 +		nvpair = nvpair + ((name_len + 3) & ~3); /* align */
 35.1025 +
 35.1026 +		type = BSWAP_32(*(uint32_t *)nvpair);
 35.1027 +		nvpair += 4;
 35.1028 +
 35.1029 +		if (((strncmp(nvp_name, name, name_len) == 0) &&
 35.1030 +		    type == valtype)) {
 35.1031 +			int nelm;
 35.1032 +
 35.1033 +			if (((nelm = BSWAP_32(*(uint32_t *)nvpair)) < 1))
 35.1034 +				return (1);
 35.1035 +			nvpair += 4;
 35.1036 +
 35.1037 +			switch (valtype) {
 35.1038 +			case DATA_TYPE_STRING:
 35.1039 +				slen = BSWAP_32(*(uint32_t *)nvpair);
 35.1040 +				nvpair += 4;
 35.1041 +				grub_memmove(strval, nvpair, slen);
 35.1042 +				strval[slen] = '\0';
 35.1043 +				return (0);
 35.1044 +
 35.1045 +			case DATA_TYPE_UINT64:
 35.1046 +				*intval = BSWAP_64(*(uint64_t *)nvpair);
 35.1047 +				return (0);
 35.1048 +
 35.1049 +			case DATA_TYPE_NVLIST:
 35.1050 +				*(void **)val = (void *)nvpair;
 35.1051 +				return (0);
 35.1052 +
 35.1053 +			case DATA_TYPE_NVLIST_ARRAY:
 35.1054 +				*(void **)val = (void *)nvpair;
 35.1055 +				if (nelmp)
 35.1056 +					*nelmp = nelm;
 35.1057 +				return (0);
 35.1058 +			}
 35.1059 +		}
 35.1060 +
 35.1061 +		nvlist += encode_size; /* goto the next nvpair */
 35.1062 +	}
 35.1063 +
 35.1064 +	return (1);
 35.1065 +}
 35.1066 +
 35.1067 +/*
 35.1068 + * Check if this vdev is online and is in a good state.
 35.1069 + */
 35.1070 +static int
 35.1071 +vdev_validate(char *nv)
 35.1072 +{
 35.1073 +	uint64_t ival;
 35.1074 +
 35.1075 +	if (nvlist_lookup_value(nv, ZPOOL_CONFIG_OFFLINE, &ival,
 35.1076 +	    DATA_TYPE_UINT64, NULL) == 0 ||
 35.1077 +	    nvlist_lookup_value(nv, ZPOOL_CONFIG_FAULTED, &ival,
 35.1078 +	    DATA_TYPE_UINT64, NULL) == 0 ||
 35.1079 +	    nvlist_lookup_value(nv, ZPOOL_CONFIG_DEGRADED, &ival,
 35.1080 +	    DATA_TYPE_UINT64, NULL) == 0 ||
 35.1081 +	    nvlist_lookup_value(nv, ZPOOL_CONFIG_REMOVED, &ival,
 35.1082 +	    DATA_TYPE_UINT64, NULL) == 0)
 35.1083 +		return (ERR_DEV_VALUES);
 35.1084 +
 35.1085 +	return (0);
 35.1086 +}
 35.1087 +
 35.1088 +/*
 35.1089 + * Get a list of valid vdev pathname from the boot device.
 35.1090 + * The caller should already allocate MAXNAMELEN memory for bootpath.
 35.1091 + */
 35.1092 +static int
 35.1093 +vdev_get_bootpath(char *nv, char *bootpath)
 35.1094 +{
 35.1095 +	char type[16];
 35.1096 +
 35.1097 +	bootpath[0] = '\0';
 35.1098 +	if (nvlist_lookup_value(nv, ZPOOL_CONFIG_TYPE, &type, DATA_TYPE_STRING,
 35.1099 +	    NULL))
 35.1100 +		return (ERR_FSYS_CORRUPT);
 35.1101 +
 35.1102 +	if (strcmp(type, VDEV_TYPE_DISK) == 0) {
 35.1103 +		if (vdev_validate(nv) != 0 ||
 35.1104 +		    nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, bootpath,
 35.1105 +		    DATA_TYPE_STRING, NULL) != 0)
 35.1106 +			return (ERR_NO_BOOTPATH);
 35.1107 +
 35.1108 +	} else if (strcmp(type, VDEV_TYPE_MIRROR) == 0) {
 35.1109 +		int nelm, i;
 35.1110 +		char *child;
 35.1111 +
 35.1112 +		if (nvlist_lookup_value(nv, ZPOOL_CONFIG_CHILDREN, &child,
 35.1113 +		    DATA_TYPE_NVLIST_ARRAY, &nelm))
 35.1114 +			return (ERR_FSYS_CORRUPT);
 35.1115 +
 35.1116 +		for (i = 0; i < nelm; i++) {
 35.1117 +			char tmp_path[MAXNAMELEN];
 35.1118 +			char *child_i;
 35.1119 +
 35.1120 +			child_i = nvlist_array(child, i);
 35.1121 +			if (vdev_validate(child_i) != 0)
 35.1122 +				continue;
 35.1123 +
 35.1124 +			if (nvlist_lookup_value(child_i, ZPOOL_CONFIG_PHYS_PATH,
 35.1125 +			    tmp_path, DATA_TYPE_STRING, NULL) != 0)
 35.1126 +				return (ERR_NO_BOOTPATH);
 35.1127 +
 35.1128 +			if ((strlen(bootpath) + strlen(tmp_path)) > MAXNAMELEN)
 35.1129 +				return (ERR_WONT_FIT);
 35.1130 +
 35.1131 +			if (strlen(bootpath) == 0)
 35.1132 +				sprintf(bootpath, "%s", tmp_path);
 35.1133 +			else
 35.1134 +				sprintf(bootpath, "%s %s", bootpath, tmp_path);
 35.1135 +		}
 35.1136 +	}
 35.1137 +
 35.1138 +	return (strlen(bootpath) > 0 ? 0 : ERR_NO_BOOTPATH);
 35.1139 +}
 35.1140 +
 35.1141 +/*
 35.1142 + * Check the disk label information and retrieve needed vdev name-value pairs.
 35.1143 + *
 35.1144 + * Return:
 35.1145 + *	0 - success
 35.1146 + *	ERR_* - failure
 35.1147 + */
 35.1148 +static int
 35.1149 +check_pool_label(fsi_file_t *ffi, int label, char *stack)
 35.1150 +{
 35.1151 +	vdev_phys_t *vdev;
 35.1152 +	uint64_t sector, pool_state, txg = 0;
 35.1153 +	char *nvlist, *nv;
 35.1154 +	zfs_bootarea_t *zfs_ba = (zfs_bootarea_t *)ffi->ff_fsi->f_data;
 35.1155 +
 35.1156 +	sector = (label * sizeof (vdev_label_t) + VDEV_SKIP_SIZE +
 35.1157 +	    VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT;
 35.1158 +
 35.1159 +	/* Read in the vdev name-value pair list (112K). */
 35.1160 +	if (devread(ffi, sector, 0, VDEV_PHYS_SIZE, stack) == 0)
 35.1161 +		return (ERR_READ);
 35.1162 +
 35.1163 +	vdev = (vdev_phys_t *)stack;
 35.1164 +
 35.1165 +	if (nvlist_unpack(vdev->vp_nvlist, &nvlist))
 35.1166 +		return (ERR_FSYS_CORRUPT);
 35.1167 +
 35.1168 +	if (nvlist_lookup_value(nvlist, ZPOOL_CONFIG_POOL_STATE, &pool_state,
 35.1169 +	    DATA_TYPE_UINT64, NULL))
 35.1170 +		return (ERR_FSYS_CORRUPT);
 35.1171 +
 35.1172 +	if (pool_state == POOL_STATE_DESTROYED)
 35.1173 +		return (ERR_FILESYSTEM_NOT_FOUND);
 35.1174 +
 35.1175 +	if (nvlist_lookup_value(nvlist, ZPOOL_CONFIG_POOL_NAME,
 35.1176 +	    current_rootpool, DATA_TYPE_STRING, NULL))
 35.1177 +		return (ERR_FSYS_CORRUPT);
 35.1178 +
 35.1179 +	if (nvlist_lookup_value(nvlist, ZPOOL_CONFIG_POOL_TXG, &txg,
 35.1180 +	    DATA_TYPE_UINT64, NULL))
 35.1181 +		return (ERR_FSYS_CORRUPT);
 35.1182 +
 35.1183 +	/* not an active device */
 35.1184 +	if (txg == 0)
 35.1185 +		return (ERR_NO_BOOTPATH);
 35.1186 +
 35.1187 +	if (nvlist_lookup_value(nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv,
 35.1188 +	    DATA_TYPE_NVLIST, NULL))
 35.1189 +		return (ERR_FSYS_CORRUPT);
 35.1190 +
 35.1191 +	if (vdev_get_bootpath(nv, current_bootpath))
 35.1192 +		return (ERR_NO_BOOTPATH);
 35.1193 +
 35.1194 +	return (0);
 35.1195 +}
 35.1196 +
 35.1197 +/*
 35.1198 + * zfs_mount() locates a valid uberblock of the root pool and read in its MOS
 35.1199 + * to the memory address MOS.
 35.1200 + *
 35.1201 + * Return:
 35.1202 + *	1 - success
 35.1203 + *	0 - failure
 35.1204 + */
 35.1205 +int
 35.1206 +zfs_mount(fsi_file_t *ffi, const char *options)
 35.1207 +{
 35.1208 +	char *stack;
 35.1209 +	int label = 0;
 35.1210 +	uberblock_phys_t *ub_array, *ubbest = NULL;
 35.1211 +	objset_phys_t *osp;
 35.1212 +	zfs_bootarea_t *zfs_ba;
 35.1213 +
 35.1214 +	/* if zfs is already mounted, don't do it again */
 35.1215 +	if (is_zfs_mount == 1)
 35.1216 +		return (1);
 35.1217 +
 35.1218 +	/* get much bigger data block for zfs */
 35.1219 +	if (((zfs_ba = malloc(sizeof (zfs_bootarea_t))) == NULL)) {
 35.1220 +		return (1);
 35.1221 +	}
 35.1222 +	bzero(zfs_ba, sizeof (zfs_bootarea_t));
 35.1223 +
 35.1224 +	/* replace small data area in fsi with big one */
 35.1225 +	free(ffi->ff_fsi->f_data);
 35.1226 +	ffi->ff_fsi->f_data = (void *)zfs_ba;
 35.1227 +
 35.1228 +	/* If an boot filesystem is passed in, set it to current_bootfs */
 35.1229 +	if (options != NULL) {
 35.1230 +		if (strlen(options) < MAXNAMELEN) {
 35.1231 +			strcpy(current_bootfs, options);
 35.1232 +		}
 35.1233 +	}
 35.1234 +
 35.1235 +	stackbase = ZFS_SCRATCH;
 35.1236 +	stack = stackbase;
 35.1237 +	ub_array = (uberblock_phys_t *)stack;
 35.1238 +	stack += VDEV_UBERBLOCK_RING;
 35.1239 +
 35.1240 +	osp = (objset_phys_t *)stack;
 35.1241 +	stack += sizeof (objset_phys_t);
 35.1242 +
 35.1243 +	/* XXX add back labels support? */
 35.1244 +	for (label = 0; ubbest == NULL && label < (VDEV_LABELS/2); label++) {
 35.1245 +		uint64_t sector = (label * sizeof (vdev_label_t) +
 35.1246 +		    VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE +
 35.1247 +		    VDEV_PHYS_SIZE) >> SPA_MINBLOCKSHIFT;
 35.1248 +
 35.1249 +
 35.1250 +		/* Read in the uberblock ring (128K). */
 35.1251 +		if (devread(ffi, sector, 0, VDEV_UBERBLOCK_RING,
 35.1252 +		    (char *)ub_array) == 0)
 35.1253 +			continue;
 35.1254 +
 35.1255 +		if ((ubbest = find_bestub(ffi, ub_array, label)) != NULL &&
 35.1256 +		    zio_read(ffi, &ubbest->ubp_uberblock.ub_rootbp, osp, stack)
 35.1257 +		    == 0) {
 35.1258 +
 35.1259 +			VERIFY_OS_TYPE(osp, DMU_OST_META);
 35.1260 +
 35.1261 +			/* Got the MOS. Save it at the memory addr MOS. */
 35.1262 +			grub_memmove(MOS, &osp->os_meta_dnode, DNODE_SIZE);
 35.1263 +
 35.1264 +			if (check_pool_label(ffi, label, stack))
 35.1265 +				return (0);
 35.1266 +
 35.1267 +			/*
 35.1268 +			 * Copy fsi->f_data to ffi->ff_data since
 35.1269 +			 * fsig_mount copies from ff_data to f_data
 35.1270 +			 * overwriting fsi->f_data.
 35.1271 +			 */
 35.1272 +			bcopy(zfs_ba, fsig_file_buf(ffi), FSYS_BUFLEN);
 35.1273 +
 35.1274 +			is_zfs_mount = 1;
 35.1275 +			return (1);
 35.1276 +		}
 35.1277 +	}
 35.1278 +
 35.1279 +	return (0);
 35.1280 +}
 35.1281 +
 35.1282 +/*
 35.1283 + * zfs_open() locates a file in the rootpool by following the
 35.1284 + * MOS and places the dnode of the file in the memory address DNODE.
 35.1285 + *
 35.1286 + * Return:
 35.1287 + *	1 - success
 35.1288 + *	0 - failure
 35.1289 + */
 35.1290 +int
 35.1291 +zfs_open(fsi_file_t *ffi, char *filename)
 35.1292 +{
 35.1293 +	char *stack;
 35.1294 +	dnode_phys_t *mdn;
 35.1295 +	char *bootstring;
 35.1296 +	zfs_bootarea_t *zfs_ba = (zfs_bootarea_t *)ffi->ff_fsi->f_data;
 35.1297 +
 35.1298 +	file_buf = NULL;
 35.1299 +	stackbase = ZFS_SCRATCH;
 35.1300 +	stack = stackbase;
 35.1301 +
 35.1302 +	mdn = (dnode_phys_t *)stack;
 35.1303 +	stack += sizeof (dnode_phys_t);
 35.1304 +
 35.1305 +	dnode_mdn = NULL;
 35.1306 +	dnode_buf = (dnode_phys_t *)stack;
 35.1307 +	stack += 1<<DNODE_BLOCK_SHIFT;
 35.1308 +
 35.1309 +	/*
 35.1310 +	 * menu.lst is placed at the root pool filesystem level,
 35.1311 +	 * do not goto 'current_bootfs'.
 35.1312 +	 */
 35.1313 +	if (is_top_dataset_file(filename)) {
 35.1314 +		if ((errnum = get_objset_mdn(ffi, MOS, NULL, NULL, mdn, stack)))
 35.1315 +			return (0);
 35.1316 +
 35.1317 +		current_bootfs_obj = 0;
 35.1318 +	} else {
 35.1319 +		if (current_bootfs[0] == '\0') {
 35.1320 +			/* Get the default root filesystem object number */
 35.1321 +			if ((errnum = get_default_bootfsobj(ffi, MOS,
 35.1322 +			    &current_bootfs_obj, stack)))
 35.1323 +				return (0);
 35.1324 +			if ((errnum = get_objset_mdn(ffi, MOS, NULL,
 35.1325 +			    &current_bootfs_obj, mdn, stack)))
 35.1326 +				return (0);
 35.1327 +		} else {
 35.1328 +			if ((errnum = get_objset_mdn(ffi, MOS,
 35.1329 +			    current_bootfs, &current_bootfs_obj, mdn, stack)))
 35.1330 +				return (0);
 35.1331 +		}
 35.1332 +
 35.1333 +		/*
 35.1334 +		 * Put zfs rootpool and boot obj number into bootstring.
 35.1335 +		 */
 35.1336 +		if (is_zfs_open == 0) {
 35.1337 +			char temp[25];		/* needs to hold long long */
 35.1338 +			int alloc_size;
 35.1339 +			char zfs_bootstr[] = "zfs-bootfs=";
 35.1340 +			char zfs_bootpath[] = ",bootpath='";
 35.1341 +
 35.1342 +			sprintf(temp, "%llu", (unsigned long long)
 35.1343 +			    current_bootfs_obj);
 35.1344 +			alloc_size = strlen(zfs_bootstr) +
 35.1345 +			    strlen(current_rootpool) +
 35.1346 +			    strlen(temp) + strlen(zfs_bootpath) +
 35.1347 +			    strlen(current_bootpath) + 3;
 35.1348 +			bootstring = fsi_bootstring_alloc(ffi->ff_fsi,
 35.1349 +			    alloc_size);
 35.1350 +			if (bootstring != NULL) {
 35.1351 +				strcpy(bootstring, zfs_bootstr);
 35.1352 +				strcat(bootstring, current_rootpool);
 35.1353 +				strcat(bootstring, "/");
 35.1354 +				strcat(bootstring, temp);
 35.1355 +				strcat(bootstring, zfs_bootpath);
 35.1356 +				strcat(bootstring, current_bootpath);
 35.1357 +				strcat(bootstring, "'");
 35.1358 +				is_zfs_open = 1;
 35.1359 +			}
 35.1360 +		}
 35.1361 +	}
 35.1362 +
 35.1363 +	if (dnode_get_path(ffi, mdn, filename, DNODE, stack)) {
 35.1364 +		errnum = ERR_FILE_NOT_FOUND;
 35.1365 +		return (0);
 35.1366 +	}
 35.1367 +
 35.1368 +	/* get the file size and set the file position to 0 */
 35.1369 +	filemax = ((znode_phys_t *)DN_BONUS(DNODE))->zp_size;
 35.1370 +	filepos = 0;
 35.1371 +
 35.1372 +	dnode_buf = NULL;
 35.1373 +	return (1);
 35.1374 +}
 35.1375 +
 35.1376 +/*
 35.1377 + * zfs_read reads in the data blocks pointed by the DNODE.
 35.1378 + *
 35.1379 + * Return:
 35.1380 + *	len - the length successfully read in to the buffer
 35.1381 + *	0   - failure
 35.1382 + */
 35.1383 +int
 35.1384 +zfs_read(fsi_file_t *ffi, char *buf, int len)
 35.1385 +{
 35.1386 +	char *stack;
 35.1387 +	int blksz, length, movesize;
 35.1388 +	zfs_bootarea_t *zfs_ba = (zfs_bootarea_t *)ffi->ff_fsi->f_data;
 35.1389 +
 35.1390 +	if (file_buf == NULL) {
 35.1391 +		file_buf = stackbase;
 35.1392 +		stackbase += SPA_MAXBLOCKSIZE;
 35.1393 +		file_start = file_end = 0;
 35.1394 +	}
 35.1395 +	stack = stackbase;
 35.1396 +
 35.1397 +	/*
 35.1398 +	 * If offset is in memory, move it into the buffer provided and return.
 35.1399 +	 */
 35.1400 +	if (filepos >= file_start && filepos+len <= file_end) {
 35.1401 +		grub_memmove(buf, file_buf + filepos - file_start, len);
 35.1402 +		filepos += len;
 35.1403 +		return (len);
 35.1404 +	}
 35.1405 +
 35.1406 +	blksz = DNODE->dn_datablkszsec << SPA_MINBLOCKSHIFT;
 35.1407 +
 35.1408 +	/*
 35.1409 +	 * Entire Dnode is too big to fit into the space available.  We
 35.1410 +	 * will need to read it in chunks.  This could be optimized to
 35.1411 +	 * read in as large a chunk as there is space available, but for
 35.1412 +	 * now, this only reads in one data block at a time.
 35.1413 +	 */
 35.1414 +	length = len;
 35.1415 +	while (length) {
 35.1416 +		/*
 35.1417 +		 * Find requested blkid and the offset within that block.
 35.1418 +		 */
 35.1419 +		uint64_t blkid = filepos / blksz;
 35.1420 +
 35.1421 +		if ((errnum = dmu_read(ffi, DNODE, blkid, file_buf, stack)))
 35.1422 +			return (0);
 35.1423 +
 35.1424 +		file_start = blkid * blksz;
 35.1425 +		file_end = file_start + blksz;
 35.1426 +
 35.1427 +		movesize = MIN(length, file_end - filepos);
 35.1428 +
 35.1429 +		grub_memmove(buf, file_buf + filepos - file_start,
 35.1430 +		    movesize);
 35.1431 +		buf += movesize;
 35.1432 +		length -= movesize;
 35.1433 +		filepos += movesize;
 35.1434 +	}
 35.1435 +
 35.1436 +	return (len);
 35.1437 +}
 35.1438 +
 35.1439 +/*
 35.1440 + * No-Op
 35.1441 + */
 35.1442 +int
 35.1443 +zfs_embed(int *start_sector, int needed_sectors)
 35.1444 +{
 35.1445 +	return (1);
 35.1446 +}
 35.1447 +
 35.1448 +fsi_plugin_ops_t *
 35.1449 +fsi_init_plugin(int version, fsi_plugin_t *fp, const char **name)
 35.1450 +{
 35.1451 +	static fsig_plugin_ops_t ops = {
 35.1452 +		FSIMAGE_PLUGIN_VERSION,
 35.1453 +		.fpo_mount = zfs_mount,
 35.1454 +		.fpo_dir = zfs_open,
 35.1455 +		.fpo_read = zfs_read
 35.1456 +	};
 35.1457 +
 35.1458 +	*name = "zfs";
 35.1459 +	return (fsig_init(fp, &ops));
 35.1460 +}
    36.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    36.2 +++ b/tools/libfsimage/zfs/fsys_zfs.h	Thu May 08 18:40:07 2008 +0900
    36.3 @@ -0,0 +1,203 @@
    36.4 +/*
    36.5 + *  GRUB  --  GRand Unified Bootloader
    36.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    36.7 + *
    36.8 + *  This program is free software; you can redistribute it and/or modify
    36.9 + *  it under the terms of the GNU General Public License as published by
   36.10 + *  the Free Software Foundation; either version 2 of the License, or
   36.11 + *  (at your option) any later version.
   36.12 + *
   36.13 + *  This program is distributed in the hope that it will be useful,
   36.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   36.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   36.16 + *  GNU General Public License for more details.
   36.17 + *
   36.18 + *  You should have received a copy of the GNU General Public License
   36.19 + *  along with this program; if not, write to the Free Software
   36.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   36.21 + */
   36.22 +/*
   36.23 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
   36.24 + * Use is subject to license terms.
   36.25 + */
   36.26 +#ifndef _FSYS_ZFS_H
   36.27 +#define	_FSYS_ZFS_H
   36.28 +
   36.29 +#include <fsimage_grub.h>
   36.30 +#include <fsimage_priv.h>
   36.31 +
   36.32 +#include "zfs-include/zfs.h"
   36.33 +#include "zfs-include/dmu.h"
   36.34 +#include "zfs-include/spa.h"
   36.35 +#include "zfs-include/zio.h"
   36.36 +#include "zfs-include/zio_checksum.h"
   36.37 +#include "zfs-include/vdev_impl.h"
   36.38 +#include "zfs-include/zap_impl.h"
   36.39 +#include "zfs-include/zap_leaf.h"
   36.40 +#include "zfs-include/uberblock_impl.h"
   36.41 +#include "zfs-include/dnode.h"
   36.42 +#include "zfs-include/dsl_dir.h"
   36.43 +#include "zfs-include/zfs_acl.h"
   36.44 +#include "zfs-include/zfs_znode.h"
   36.45 +#include "zfs-include/dsl_dataset.h"
   36.46 +#include "zfs-include/zil.h"
   36.47 +#include "zfs-include/dmu_objset.h"
   36.48 +
   36.49 +/*
   36.50 + * Global Memory addresses to store MOS and DNODE data
   36.51 + */
   36.52 +#define	MOS		((dnode_phys_t *)(((zfs_bootarea_t *) \
   36.53 +			    (ffi->ff_fsi->f_data))->zfs_data))
   36.54 +#define	DNODE		(MOS+1) /* move sizeof(dnode_phys_t) bytes */
   36.55 +#define	ZFS_SCRATCH	((char *)(DNODE+1))
   36.56 +
   36.57 +#define	MAXNAMELEN	256
   36.58 +
   36.59 +typedef struct zfs_bootarea {
   36.60 +	char zfs_current_bootpath[MAXNAMELEN];
   36.61 +	char zfs_current_rootpool[MAXNAMELEN];
   36.62 +	char zfs_current_bootfs[MAXNAMELEN];
   36.63 +	uint64_t zfs_current_bootfs_obj;
   36.64 +	int zfs_open;
   36.65 +
   36.66 +	/* cache for a file block of the currently zfs_open()-ed file */
   36.67 +	void *zfs_file_buf;
   36.68 +	uint64_t zfs_file_start;
   36.69 +	uint64_t zfs_file_end;
   36.70 +
   36.71 +	/* cache for a dnode block */
   36.72 +	dnode_phys_t *zfs_dnode_buf;
   36.73 +	dnode_phys_t *zfs_dnode_mdn;
   36.74 +	uint64_t zfs_dnode_start;
   36.75 +	uint64_t zfs_dnode_end;
   36.76 +
   36.77 +	char *zfs_stackbase;
   36.78 +	char zfs_data[0x400000];
   36.79 +} zfs_bootarea_t;
   36.80 +
   36.81 +/*
   36.82 + * Verify dnode type.
   36.83 + * Can only be used in functions returning non-0 for failure.
   36.84 + */
   36.85 +#define	VERIFY_DN_TYPE(dnp, type) \
   36.86 +	if (type && (dnp)->dn_type != type) { \
   36.87 +		return (ERR_FSYS_CORRUPT); \
   36.88 +	}
   36.89 +
   36.90 +/*
   36.91 + * Verify object set type.
   36.92 + * Can only be used in functions returning 0 for failure.
   36.93 + */
   36.94 +#define	VERIFY_OS_TYPE(osp, type) \
   36.95 +	if (type && (osp)->os_type != type) { \
   36.96 +		errnum = ERR_FSYS_CORRUPT; \
   36.97 +		return (0); \
   36.98 +	}
   36.99 +
  36.100 +#define	ZPOOL_PROP_BOOTFS		"bootfs"
  36.101 +
  36.102 +/* General macros */
  36.103 +#define	BSWAP_8(x)	((x) & 0xff)
  36.104 +#define	BSWAP_16(x)	((BSWAP_8(x) << 8) | BSWAP_8((x) >> 8))
  36.105 +#define	BSWAP_32(x)	((BSWAP_16(x) << 16) | BSWAP_16((x) >> 16))
  36.106 +#define	BSWAP_64(x)	((BSWAP_32(x) << 32) | BSWAP_32((x) >> 32))
  36.107 +#define	P2ROUNDUP(x, align)	(-(-(x) & -(align)))
  36.108 +
  36.109 +/*
  36.110 + * XXX Match these macro up with real zfs once we have nvlist support so that we
  36.111 + * can support large sector disks.
  36.112 + */
  36.113 +#define	UBERBLOCK_SIZE		(1ULL << UBERBLOCK_SHIFT)
  36.114 +#undef	offsetof
  36.115 +#define	offsetof(t, m)   (size_t)(&(((t *)0)->m))
  36.116 +#define	VDEV_UBERBLOCK_SHIFT	UBERBLOCK_SHIFT
  36.117 +#define	VDEV_UBERBLOCK_OFFSET(n) \
  36.118 +offsetof(vdev_label_t, vl_uberblock[(n) << VDEV_UBERBLOCK_SHIFT])
  36.119 +
  36.120 +typedef struct uberblock uberblock_t;
  36.121 +
  36.122 +/* XXX Uberblock_phys_t is no longer in the kernel zfs */
  36.123 +typedef struct uberblock_phys {
  36.124 +	uberblock_t	ubp_uberblock;
  36.125 +	char		ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
  36.126 +				sizeof (zio_block_tail_t)];
  36.127 +	zio_block_tail_t ubp_zbt;
  36.128 +} uberblock_phys_t;
  36.129 +
  36.130 +/*
  36.131 + * Macros to get fields in a bp or DVA.
  36.132 + */
  36.133 +#define	P2PHASE(x, align)		((x) & ((align) - 1))
  36.134 +#define	DVA_OFFSET_TO_PHYS_SECTOR(offset) \
  36.135 +	((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT)
  36.136 +
  36.137 +/*
  36.138 + * For nvlist manipulation. (from nvpair.h)
  36.139 + */
  36.140 +#define	NV_ENCODE_NATIVE	0
  36.141 +#define	NV_ENCODE_XDR		1
  36.142 +#define	HOST_ENDIAN		1	/* for x86 machine */
  36.143 +#define	DATA_TYPE_UINT64	8
  36.144 +#define	DATA_TYPE_STRING	9
  36.145 +#define	DATA_TYPE_NVLIST	19
  36.146 +#define	DATA_TYPE_NVLIST_ARRAY	20
  36.147 +
  36.148 +/*
  36.149 + * Decompression Entry - lzjb
  36.150 + */
  36.151 +#ifndef	NBBY
  36.152 +#define	NBBY	8
  36.153 +#endif
  36.154 +
  36.155 +typedef int zfs_decomp_func_t(void *s_start, void *d_start, size_t s_len,
  36.156 +			size_t d_len);
  36.157 +typedef struct decomp_entry {
  36.158 +	char *name;
  36.159 +	zfs_decomp_func_t *decomp_func;
  36.160 +} decomp_entry_t;
  36.161 +
  36.162 +/*
  36.163 + * FAT ZAP data structures
  36.164 + */
  36.165 +#define	ZFS_CRC64_POLY 0xC96C5795D7870F42ULL /* ECMA-182, reflected form */
  36.166 +#define	ZAP_HASH_IDX(hash, n)	(((n) == 0) ? 0 : ((hash) >> (64 - (n))))
  36.167 +#define	CHAIN_END	0xffff	/* end of the chunk chain */
  36.168 +
  36.169 +/*
  36.170 + * The amount of space within the chunk available for the array is:
  36.171 + * chunk size - space for type (1) - space for next pointer (2)
  36.172 + */
  36.173 +#define	ZAP_LEAF_ARRAY_BYTES (ZAP_LEAF_CHUNKSIZE - 3)
  36.174 +
  36.175 +#define	ZAP_LEAF_HASH_SHIFT(bs)	(bs - 5)
  36.176 +#define	ZAP_LEAF_HASH_NUMENTRIES(bs) (1 << ZAP_LEAF_HASH_SHIFT(bs))
  36.177 +#define	LEAF_HASH(bs, h) \
  36.178 +	((ZAP_LEAF_HASH_NUMENTRIES(bs)-1) & \
  36.179 +	((h) >> (64 - ZAP_LEAF_HASH_SHIFT(bs)-l->l_hdr.lh_prefix_len)))
  36.180 +
  36.181 +/*
  36.182 + * The amount of space available for chunks is:
  36.183 + * block size shift - hash entry size (2) * number of hash
  36.184 + * entries - header space (2*chunksize)
  36.185 + */
  36.186 +#define	ZAP_LEAF_NUMCHUNKS(bs) \
  36.187 +	(((1<<bs) - 2*ZAP_LEAF_HASH_NUMENTRIES(bs)) / \
  36.188 +	ZAP_LEAF_CHUNKSIZE - 2)
  36.189 +
  36.190 +/*
  36.191 + * The chunks start immediately after the hash table.  The end of the
  36.192 + * hash table is at l_hash + HASH_NUMENTRIES, which we simply cast to a
  36.193 + * chunk_t.
  36.194 + */
  36.195 +#define	ZAP_LEAF_CHUNK(l, bs, idx) \
  36.196 +	((zap_leaf_chunk_t *)(l->l_hash + ZAP_LEAF_HASH_NUMENTRIES(bs)))[idx]
  36.197 +#define	ZAP_LEAF_ENTRY(l, bs, idx) (&ZAP_LEAF_CHUNK(l, bs, idx).l_entry)
  36.198 +
  36.199 +extern void fletcher_2_native(const void *, uint64_t, zio_cksum_t *);
  36.200 +extern void fletcher_2_byteswap(const void *, uint64_t, zio_cksum_t *);
  36.201 +extern void fletcher_4_native(const void *, uint64_t, zio_cksum_t *);
  36.202 +extern void fletcher_4_byteswap(const void *, uint64_t, zio_cksum_t *);
  36.203 +extern void zio_checksum_SHA256(const void *, uint64_t, zio_cksum_t *);
  36.204 +extern int lzjb_decompress(void *, void *, size_t, size_t);
  36.205 +
  36.206 +#endif /* !_FSYS_ZFS_H */
    37.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    37.2 +++ b/tools/libfsimage/zfs/mb_info.h	Thu May 08 18:40:07 2008 +0900
    37.3 @@ -0,0 +1,217 @@
    37.4 +/*
    37.5 + *  GRUB  --  GRand Unified Bootloader
    37.6 + *  Copyright (C) 2000,2003  Free Software Foundation, Inc.
    37.7 + *
    37.8 + *  This program is free software; you can redistribute it and/or modify
    37.9 + *  it under the terms of the GNU General Public License as published by
   37.10 + *  the Free Software Foundation; either version 2 of the License, or
   37.11 + *  (at your option) any later version.
   37.12 + *
   37.13 + *  This program is distributed in the hope that it will be useful,
   37.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   37.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   37.16 + *  GNU General Public License for more details.
   37.17 + *
   37.18 + *  You should have received a copy of the GNU General Public License
   37.19 + *  along with this program; if not, write to the Free Software
   37.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   37.21 + */
   37.22 +
   37.23 +/*
   37.24 + *  The structure type "mod_list" is used by the "multiboot_info" structure.
   37.25 + */
   37.26 +
   37.27 +struct mod_list
   37.28 +{
   37.29 +  /* the memory used goes from bytes 'mod_start' to 'mod_end-1' inclusive */
   37.30 +  unsigned long mod_start;
   37.31 +  unsigned long mod_end;
   37.32 +  
   37.33 +  /* Module command line */
   37.34 +  unsigned long cmdline;
   37.35 +  
   37.36 +  /* padding to take it to 16 bytes (must be zero) */
   37.37 +  unsigned long pad;
   37.38 +};
   37.39 +
   37.40 +
   37.41 +/*
   37.42 + *  INT-15, AX=E820 style "AddressRangeDescriptor"
   37.43 + *  ...with a "size" parameter on the front which is the structure size - 4,
   37.44 + *  pointing to the next one, up until the full buffer length of the memory
   37.45 + *  map has been reached.
   37.46 + */
   37.47 +
   37.48 +struct AddrRangeDesc
   37.49 +{
   37.50 +  unsigned long size;
   37.51 +  unsigned long long BaseAddr;
   37.52 +  unsigned long long Length;
   37.53 +  unsigned long Type;
   37.54 +  
   37.55 +  /* unspecified optional padding... */
   37.56 +} __attribute__ ((packed));
   37.57 +
   37.58 +/* usable memory "Type", all others are reserved.  */
   37.59 +#define MB_ARD_MEMORY		1
   37.60 +
   37.61 +
   37.62 +/* Drive Info structure.  */
   37.63 +struct drive_info
   37.64 +{
   37.65 +  /* The size of this structure.  */
   37.66 +  unsigned long size;
   37.67 +
   37.68 +  /* The BIOS drive number.  */
   37.69 +  unsigned char drive_number;
   37.70 +
   37.71 +  /* The access mode (see below).  */
   37.72 +  unsigned char drive_mode;
   37.73 +
   37.74 +  /* The BIOS geometry.  */
   37.75 +  unsigned short drive_cylinders;
   37.76 +  unsigned char drive_heads;
   37.77 +  unsigned char drive_sectors;
   37.78 +
   37.79 +  /* The array of I/O ports used for the drive.  */
   37.80 +  unsigned short drive_ports[0];
   37.81 +};
   37.82 +
   37.83 +/* Drive Mode.  */
   37.84 +#define MB_DI_CHS_MODE		0
   37.85 +#define MB_DI_LBA_MODE		1
   37.86 +
   37.87 +
   37.88 +/* APM BIOS info.  */
   37.89 +struct apm_info
   37.90 +{
   37.91 +  unsigned short version;
   37.92 +  unsigned short cseg;
   37.93 +  unsigned long offset;
   37.94 +  unsigned short cseg_16;
   37.95 +  unsigned short dseg_16;
   37.96 +  unsigned short cseg_len;
   37.97 +  unsigned short cseg_16_len;
   37.98 +  unsigned short dseg_16_len;
   37.99 +};
  37.100 +
  37.101 +
  37.102 +/*
  37.103 + *  MultiBoot Info description
  37.104 + *
  37.105 + *  This is the struct passed to the boot image.  This is done by placing
  37.106 + *  its address in the EAX register.
  37.107 + */
  37.108 +
  37.109 +struct multiboot_info
  37.110 +{
  37.111 +  /* MultiBoot info version number */
  37.112 +  unsigned long flags;
  37.113 +  
  37.114 +  /* Available memory from BIOS */
  37.115 +  unsigned long mem_lower;
  37.116 +  unsigned long mem_upper;
  37.117 +  
  37.118 +  /* "root" partition */
  37.119 +  unsigned long boot_device;
  37.120 +  
  37.121 +  /* Kernel command line */
  37.122 +  unsigned long cmdline;
  37.123 +  
  37.124 +  /* Boot-Module list */
  37.125 +  unsigned long mods_count;
  37.126 +  unsigned long mods_addr;
  37.127 +  
  37.128 +  union
  37.129 +  {
  37.130 +    struct
  37.131 +    {
  37.132 +      /* (a.out) Kernel symbol table info */
  37.133 +      unsigned long tabsize;
  37.134 +      unsigned long strsize;
  37.135 +      unsigned long addr;
  37.136 +      unsigned long pad;
  37.137 +    }
  37.138 +    a;
  37.139 +    
  37.140 +    struct
  37.141 +    {
  37.142 +      /* (ELF) Kernel section header table */
  37.143 +      unsigned long num;
  37.144 +      unsigned long size;
  37.145 +      unsigned long addr;
  37.146 +      unsigned long shndx;
  37.147 +    }
  37.148 +    e;
  37.149 +  }
  37.150 +  syms;
  37.151 +  
  37.152 +  /* Memory Mapping buffer */
  37.153 +  unsigned long mmap_length;
  37.154 +  unsigned long mmap_addr;
  37.155 +  
  37.156 +  /* Drive Info buffer */
  37.157 +  unsigned long drives_length;
  37.158 +  unsigned long drives_addr;
  37.159 +  
  37.160 +  /* ROM configuration table */
  37.161 +  unsigned long config_table;
  37.162 +  
  37.163 +  /* Boot Loader Name */
  37.164 +  unsigned long boot_loader_name;
  37.165 +
  37.166 +  /* APM table */
  37.167 +  unsigned long apm_table;
  37.168 +
  37.169 +  /* Video */
  37.170 +  unsigned long vbe_control_info;
  37.171 +  unsigned long vbe_mode_info;
  37.172 +  unsigned short vbe_mode;
  37.173 +  unsigned short vbe_interface_seg;
  37.174 +  unsigned short vbe_interface_off;
  37.175 +  unsigned short vbe_interface_len;
  37.176 +};
  37.177 +
  37.178 +/*
  37.179 + *  Flags to be set in the 'flags' parameter above
  37.180 + */
  37.181 +
  37.182 +/* is there basic lower/upper memory information? */
  37.183 +#define MB_INFO_MEMORY			0x00000001
  37.184 +/* is there a boot device set? */
  37.185 +#define MB_INFO_BOOTDEV			0x00000002
  37.186 +/* is the command-line defined? */
  37.187 +#define MB_INFO_CMDLINE			0x00000004
  37.188 +/* are there modules to do something with? */
  37.189 +#define MB_INFO_MODS			0x00000008
  37.190 +
  37.191 +/* These next two are mutually exclusive */
  37.192 +
  37.193 +/* is there a symbol table loaded? */
  37.194 +#define MB_INFO_AOUT_SYMS		0x00000010
  37.195 +/* is there an ELF section header table? */
  37.196 +#define MB_INFO_ELF_SHDR		0x00000020
  37.197 +
  37.198 +/* is there a full memory map? */
  37.199 +#define MB_INFO_MEM_MAP			0x00000040
  37.200 +
  37.201 +/* Is there drive info?  */
  37.202 +#define MB_INFO_DRIVE_INFO		0x00000080
  37.203 +
  37.204 +/* Is there a config table?  */
  37.205 +#define MB_INFO_CONFIG_TABLE		0x00000100
  37.206 +
  37.207 +/* Is there a boot loader name?  */
  37.208 +#define MB_INFO_BOOT_LOADER_NAME	0x00000200
  37.209 +
  37.210 +/* Is there a APM table?  */
  37.211 +#define MB_INFO_APM_TABLE		0x00000400
  37.212 +
  37.213 +/* Is there video information?  */
  37.214 +#define MB_INFO_VIDEO_INFO		0x00000800
  37.215 +
  37.216 +/*
  37.217 + *  The following value must be present in the EAX register.
  37.218 + */
  37.219 +
  37.220 +#define MULTIBOOT_VALID			0x2BADB002
    38.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    38.2 +++ b/tools/libfsimage/zfs/zfs-include/dmu.h	Thu May 08 18:40:07 2008 +0900
    38.3 @@ -0,0 +1,105 @@
    38.4 +/*
    38.5 + *  GRUB  --  GRand Unified Bootloader
    38.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    38.7 + *
    38.8 + *  This program is free software; you can redistribute it and/or modify
    38.9 + *  it under the terms of the GNU General Public License as published by
   38.10 + *  the Free Software Foundation; either version 2 of the License, or
   38.11 + *  (at your option) any later version.
   38.12 + *
   38.13 + *  This program is distributed in the hope that it will be useful,
   38.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   38.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   38.16 + *  GNU General Public License for more details.
   38.17 + *
   38.18 + *  You should have received a copy of the GNU General Public License
   38.19 + *  along with this program; if not, write to the Free Software
   38.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   38.21 + */
   38.22 +/*
   38.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   38.24 + * Use is subject to license terms.
   38.25 + */
   38.26 +
   38.27 +#ifndef	_SYS_DMU_H
   38.28 +#define	_SYS_DMU_H
   38.29 +
   38.30 +/*
   38.31 + * This file describes the interface that the DMU provides for its
   38.32 + * consumers.
   38.33 + *
   38.34 + * The DMU also interacts with the SPA.  That interface is described in
   38.35 + * dmu_spa.h.
   38.36 + */
   38.37 +typedef enum dmu_object_type {
   38.38 +	DMU_OT_NONE,
   38.39 +	/* general: */
   38.40 +	DMU_OT_OBJECT_DIRECTORY,	/* ZAP */
   38.41 +	DMU_OT_OBJECT_ARRAY,		/* UINT64 */
   38.42 +	DMU_OT_PACKED_NVLIST,		/* UINT8 (XDR by nvlist_pack/unpack) */
   38.43 +	DMU_OT_PACKED_NVLIST_SIZE,	/* UINT64 */
   38.44 +	DMU_OT_BPLIST,			/* UINT64 */
   38.45 +	DMU_OT_BPLIST_HDR,		/* UINT64 */
   38.46 +	/* spa: */
   38.47 +	DMU_OT_SPACE_MAP_HEADER,	/* UINT64 */
   38.48 +	DMU_OT_SPACE_MAP,		/* UINT64 */
   38.49 +	/* zil: */
   38.50 +	DMU_OT_INTENT_LOG,		/* UINT64 */
   38.51 +	/* dmu: */
   38.52 +	DMU_OT_DNODE,			/* DNODE */
   38.53 +	DMU_OT_OBJSET,			/* OBJSET */
   38.54 +	/* dsl: */
   38.55 +	DMU_OT_DSL_DIR,			/* UINT64 */
   38.56 +	DMU_OT_DSL_DIR_CHILD_MAP,	/* ZAP */
   38.57 +	DMU_OT_DSL_DS_SNAP_MAP,		/* ZAP */
   38.58 +	DMU_OT_DSL_PROPS,		/* ZAP */
   38.59 +	DMU_OT_DSL_DATASET,		/* UINT64 */
   38.60 +	/* zpl: */
   38.61 +	DMU_OT_ZNODE,			/* ZNODE */
   38.62 +	DMU_OT_ACL,			/* ACL */
   38.63 +	DMU_OT_PLAIN_FILE_CONTENTS,	/* UINT8 */
   38.64 +	DMU_OT_DIRECTORY_CONTENTS,	/* ZAP */
   38.65 +	DMU_OT_MASTER_NODE,		/* ZAP */
   38.66 +	DMU_OT_UNLINKED_SET,		/* ZAP */
   38.67 +	/* zvol: */
   38.68 +	DMU_OT_ZVOL,			/* UINT8 */
   38.69 +	DMU_OT_ZVOL_PROP,		/* ZAP */
   38.70 +	/* other; for testing only! */
   38.71 +	DMU_OT_PLAIN_OTHER,		/* UINT8 */
   38.72 +	DMU_OT_UINT64_OTHER,		/* UINT64 */
   38.73 +	DMU_OT_ZAP_OTHER,		/* ZAP */
   38.74 +	/* new object types: */
   38.75 +	DMU_OT_ERROR_LOG,		/* ZAP */
   38.76 +	DMU_OT_SPA_HISTORY,		/* UINT8 */
   38.77 +	DMU_OT_SPA_HISTORY_OFFSETS,	/* spa_his_phys_t */
   38.78 +	DMU_OT_POOL_PROPS,		/* ZAP */
   38.79 +
   38.80 +	DMU_OT_NUMTYPES
   38.81 +} dmu_object_type_t;
   38.82 +
   38.83 +typedef enum dmu_objset_type {
   38.84 +	DMU_OST_NONE,
   38.85 +	DMU_OST_META,
   38.86 +	DMU_OST_ZFS,
   38.87 +	DMU_OST_ZVOL,
   38.88 +	DMU_OST_OTHER,			/* For testing only! */
   38.89 +	DMU_OST_ANY,			/* Be careful! */
   38.90 +	DMU_OST_NUMTYPES
   38.91 +} dmu_objset_type_t;
   38.92 +
   38.93 +/*
   38.94 + * The names of zap entries in the DIRECTORY_OBJECT of the MOS.
   38.95 + */
   38.96 +#define	DMU_POOL_DIRECTORY_OBJECT	1
   38.97 +#define	DMU_POOL_CONFIG			"config"
   38.98 +#define	DMU_POOL_ROOT_DATASET		"root_dataset"
   38.99 +#define	DMU_POOL_SYNC_BPLIST		"sync_bplist"
  38.100 +#define	DMU_POOL_ERRLOG_SCRUB		"errlog_scrub"
  38.101 +#define	DMU_POOL_ERRLOG_LAST		"errlog_last"
  38.102 +#define	DMU_POOL_SPARES			"spares"
  38.103 +#define	DMU_POOL_DEFLATE		"deflate"
  38.104 +#define	DMU_POOL_HISTORY		"history"
  38.105 +#define	DMU_POOL_PROPS			"pool_props"
  38.106 +#define	DMU_POOL_L2CACHE		"l2cache"
  38.107 +
  38.108 +#endif	/* _SYS_DMU_H */
    39.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    39.2 +++ b/tools/libfsimage/zfs/zfs-include/dmu_objset.h	Thu May 08 18:40:07 2008 +0900
    39.3 @@ -0,0 +1,35 @@
    39.4 +/*
    39.5 + *  GRUB  --  GRand Unified Bootloader
    39.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    39.7 + *
    39.8 + *  This program is free software; you can redistribute it and/or modify
    39.9 + *  it under the terms of the GNU General Public License as published by
   39.10 + *  the Free Software Foundation; either version 2 of the License, or
   39.11 + *  (at your option) any later version.
   39.12 + *
   39.13 + *  This program is distributed in the hope that it will be useful,
   39.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   39.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   39.16 + *  GNU General Public License for more details.
   39.17 + *
   39.18 + *  You should have received a copy of the GNU General Public License
   39.19 + *  along with this program; if not, write to the Free Software
   39.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   39.21 + */
   39.22 +/*
   39.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   39.24 + * Use is subject to license terms.
   39.25 + */
   39.26 +
   39.27 +#ifndef	_SYS_DMU_OBJSET_H
   39.28 +#define	_SYS_DMU_OBJSET_H
   39.29 +
   39.30 +typedef struct objset_phys {
   39.31 +	dnode_phys_t os_meta_dnode;
   39.32 +	zil_header_t os_zil_header;
   39.33 +	uint64_t os_type;
   39.34 +	char os_pad[1024 - sizeof (dnode_phys_t) - sizeof (zil_header_t) -
   39.35 +	    sizeof (uint64_t)];
   39.36 +} objset_phys_t;
   39.37 +
   39.38 +#endif /* _SYS_DMU_OBJSET_H */
    40.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    40.2 +++ b/tools/libfsimage/zfs/zfs-include/dnode.h	Thu May 08 18:40:07 2008 +0900
    40.3 @@ -0,0 +1,76 @@
    40.4 +/*
    40.5 + *  GRUB  --  GRand Unified Bootloader
    40.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    40.7 + *
    40.8 + *  This program is free software; you can redistribute it and/or modify
    40.9 + *  it under the terms of the GNU General Public License as published by
   40.10 + *  the Free Software Foundation; either version 2 of the License, or
   40.11 + *  (at your option) any later version.
   40.12 + *
   40.13 + *  This program is distributed in the hope that it will be useful,
   40.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   40.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   40.16 + *  GNU General Public License for more details.
   40.17 + *
   40.18 + *  You should have received a copy of the GNU General Public License
   40.19 + *  along with this program; if not, write to the Free Software
   40.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   40.21 + */
   40.22 +/*
   40.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   40.24 + * Use is subject to license terms.
   40.25 + */
   40.26 +
   40.27 +#ifndef	_SYS_DNODE_H
   40.28 +#define	_SYS_DNODE_H
   40.29 +
   40.30 +/*
   40.31 + * Fixed constants.
   40.32 + */
   40.33 +#define	DNODE_SHIFT		9	/* 512 bytes */
   40.34 +#define	DN_MIN_INDBLKSHIFT	10	/* 1k */
   40.35 +#define	DN_MAX_INDBLKSHIFT	14	/* 16k */
   40.36 +#define	DNODE_BLOCK_SHIFT	14	/* 16k */
   40.37 +#define	DNODE_CORE_SIZE		64	/* 64 bytes for dnode sans blkptrs */
   40.38 +#define	DN_MAX_OBJECT_SHIFT	48	/* 256 trillion (zfs_fid_t limit) */
   40.39 +#define	DN_MAX_OFFSET_SHIFT	64	/* 2^64 bytes in a dnode */
   40.40 +
   40.41 +/*
   40.42 + * Derived constants.
   40.43 + */
   40.44 +#define	DNODE_SIZE	(1 << DNODE_SHIFT)
   40.45 +#define	DN_MAX_NBLKPTR	((DNODE_SIZE - DNODE_CORE_SIZE) >> SPA_BLKPTRSHIFT)
   40.46 +#define	DN_MAX_BONUSLEN	(DNODE_SIZE - DNODE_CORE_SIZE - (1 << SPA_BLKPTRSHIFT))
   40.47 +#define	DN_MAX_OBJECT	(1ULL << DN_MAX_OBJECT_SHIFT)
   40.48 +
   40.49 +#define	DNODES_PER_BLOCK_SHIFT	(DNODE_BLOCK_SHIFT - DNODE_SHIFT)
   40.50 +#define	DNODES_PER_BLOCK	(1ULL << DNODES_PER_BLOCK_SHIFT)
   40.51 +#define	DNODES_PER_LEVEL_SHIFT	(DN_MAX_INDBLKSHIFT - SPA_BLKPTRSHIFT)
   40.52 +
   40.53 +#define	DN_BONUS(dnp)	((void*)((dnp)->dn_bonus + \
   40.54 +	(((dnp)->dn_nblkptr - 1) * sizeof (blkptr_t))))
   40.55 +
   40.56 +typedef struct dnode_phys {
   40.57 +	uint8_t dn_type;		/* dmu_object_type_t */
   40.58 +	uint8_t dn_indblkshift;		/* ln2(indirect block size) */
   40.59 +	uint8_t dn_nlevels;		/* 1=dn_blkptr->data blocks */
   40.60 +	uint8_t dn_nblkptr;		/* length of dn_blkptr */
   40.61 +	uint8_t dn_bonustype;		/* type of data in bonus buffer */
   40.62 +	uint8_t	dn_checksum;		/* ZIO_CHECKSUM type */
   40.63 +	uint8_t	dn_compress;		/* ZIO_COMPRESS type */
   40.64 +	uint8_t dn_flags;		/* DNODE_FLAG_* */
   40.65 +	uint16_t dn_datablkszsec;	/* data block size in 512b sectors */
   40.66 +	uint16_t dn_bonuslen;		/* length of dn_bonus */
   40.67 +	uint8_t dn_pad2[4];
   40.68 +
   40.69 +	/* accounting is protected by dn_dirty_mtx */
   40.70 +	uint64_t dn_maxblkid;		/* largest allocated block ID */
   40.71 +	uint64_t dn_used;		/* bytes (or sectors) of disk space */
   40.72 +
   40.73 +	uint64_t dn_pad3[4];
   40.74 +
   40.75 +	blkptr_t dn_blkptr[1];
   40.76 +	uint8_t dn_bonus[DN_MAX_BONUSLEN];
   40.77 +} dnode_phys_t;
   40.78 +
   40.79 +#endif	/* _SYS_DNODE_H */
    41.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    41.2 +++ b/tools/libfsimage/zfs/zfs-include/dsl_dataset.h	Thu May 08 18:40:07 2008 +0900
    41.3 @@ -0,0 +1,53 @@
    41.4 +/*
    41.5 + *  GRUB  --  GRand Unified Bootloader
    41.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    41.7 + *
    41.8 + *  This program is free software; you can redistribute it and/or modify
    41.9 + *  it under the terms of the GNU General Public License as published by
   41.10 + *  the Free Software Foundation; either version 2 of the License, or
   41.11 + *  (at your option) any later version.
   41.12 + *
   41.13 + *  This program is distributed in the hope that it will be useful,
   41.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   41.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   41.16 + *  GNU General Public License for more details.
   41.17 + *
   41.18 + *  You should have received a copy of the GNU General Public License
   41.19 + *  along with this program; if not, write to the Free Software
   41.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   41.21 + */
   41.22 +/*
   41.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   41.24 + * Use is subject to license terms.
   41.25 + */
   41.26 +
   41.27 +#ifndef	_SYS_DSL_DATASET_H
   41.28 +#define	_SYS_DSL_DATASET_H
   41.29 +
   41.30 +typedef struct dsl_dataset_phys {
   41.31 +	uint64_t ds_dir_obj;
   41.32 +	uint64_t ds_prev_snap_obj;
   41.33 +	uint64_t ds_prev_snap_txg;
   41.34 +	uint64_t ds_next_snap_obj;
   41.35 +	uint64_t ds_snapnames_zapobj;	/* zap obj of snaps; ==0 for snaps */
   41.36 +	uint64_t ds_num_children;	/* clone/snap children; ==0 for head */
   41.37 +	uint64_t ds_creation_time;	/* seconds since 1970 */
   41.38 +	uint64_t ds_creation_txg;
   41.39 +	uint64_t ds_deadlist_obj;
   41.40 +	uint64_t ds_used_bytes;
   41.41 +	uint64_t ds_compressed_bytes;
   41.42 +	uint64_t ds_uncompressed_bytes;
   41.43 +	uint64_t ds_unique_bytes;	/* only relevant to snapshots */
   41.44 +	/*
   41.45 +	 * The ds_fsid_guid is a 56-bit ID that can change to avoid
   41.46 +	 * collisions.  The ds_guid is a 64-bit ID that will never
   41.47 +	 * change, so there is a small probability that it will collide.
   41.48 +	 */
   41.49 +	uint64_t ds_fsid_guid;
   41.50 +	uint64_t ds_guid;
   41.51 +	uint64_t ds_flags;
   41.52 +	blkptr_t ds_bp;
   41.53 +	uint64_t ds_pad[8]; /* pad out to 320 bytes for good measure */
   41.54 +} dsl_dataset_phys_t;
   41.55 +
   41.56 +#endif /* _SYS_DSL_DATASET_H */
    42.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    42.2 +++ b/tools/libfsimage/zfs/zfs-include/dsl_dir.h	Thu May 08 18:40:07 2008 +0900
    42.3 @@ -0,0 +1,49 @@
    42.4 +/*
    42.5 + *  GRUB  --  GRand Unified Bootloader
    42.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    42.7 + *
    42.8 + *  This program is free software; you can redistribute it and/or modify
    42.9 + *  it under the terms of the GNU General Public License as published by
   42.10 + *  the Free Software Foundation; either version 2 of the License, or
   42.11 + *  (at your option) any later version.
   42.12 + *
   42.13 + *  This program is distributed in the hope that it will be useful,
   42.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   42.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   42.16 + *  GNU General Public License for more details.
   42.17 + *
   42.18 + *  You should have received a copy of the GNU General Public License
   42.19 + *  along with this program; if not, write to the Free Software
   42.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   42.21 + */
   42.22 +/*
   42.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   42.24 + * Use is subject to license terms.
   42.25 + */
   42.26 +
   42.27 +#ifndef	_SYS_DSL_DIR_H
   42.28 +#define	_SYS_DSL_DIR_H
   42.29 +
   42.30 +typedef struct dsl_dir_phys {
   42.31 +	uint64_t dd_creation_time; /* not actually used */
   42.32 +	uint64_t dd_head_dataset_obj;
   42.33 +	uint64_t dd_parent_obj;
   42.34 +	uint64_t dd_clone_parent_obj;
   42.35 +	uint64_t dd_child_dir_zapobj;
   42.36 +	/*
   42.37 +	 * how much space our children are accounting for; for leaf
   42.38 +	 * datasets, == physical space used by fs + snaps
   42.39 +	 */
   42.40 +	uint64_t dd_used_bytes;
   42.41 +	uint64_t dd_compressed_bytes;
   42.42 +	uint64_t dd_uncompressed_bytes;
   42.43 +	/* Administrative quota setting */
   42.44 +	uint64_t dd_quota;
   42.45 +	/* Administrative reservation setting */
   42.46 +	uint64_t dd_reserved;
   42.47 +	uint64_t dd_props_zapobj;
   42.48 +	uint64_t dd_deleg_zapobj;	/* dataset permissions */
   42.49 +	uint64_t dd_pad[20]; /* pad out to 256 bytes for good measure */
   42.50 +} dsl_dir_phys_t;
   42.51 +
   42.52 +#endif /* _SYS_DSL_DIR_H */
    43.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    43.2 +++ b/tools/libfsimage/zfs/zfs-include/spa.h	Thu May 08 18:40:07 2008 +0900
    43.3 @@ -0,0 +1,283 @@
    43.4 +/*
    43.5 + *  GRUB  --  GRand Unified Bootloader
    43.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    43.7 + *
    43.8 + *  This program is free software; you can redistribute it and/or modify
    43.9 + *  it under the terms of the GNU General Public License as published by
   43.10 + *  the Free Software Foundation; either version 2 of the License, or
   43.11 + *  (at your option) any later version.
   43.12 + *
   43.13 + *  This program is distributed in the hope that it will be useful,
   43.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   43.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   43.16 + *  GNU General Public License for more details.
   43.17 + *
   43.18 + *  You should have received a copy of the GNU General Public License
   43.19 + *  along with this program; if not, write to the Free Software
   43.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   43.21 + */
   43.22 +/*
   43.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   43.24 + * Use is subject to license terms.
   43.25 + */
   43.26 +
   43.27 +#ifndef _SYS_SPA_H
   43.28 +#define	_SYS_SPA_H
   43.29 +
   43.30 +/*
   43.31 + * General-purpose 32-bit and 64-bit bitfield encodings.
   43.32 + */
   43.33 +#define	BF32_DECODE(x, low, len)	P2PHASE((x) >> (low), 1U << (len))
   43.34 +#define	BF64_DECODE(x, low, len)	P2PHASE((x) >> (low), 1ULL << (len))
   43.35 +#define	BF32_ENCODE(x, low, len)	(P2PHASE((x), 1U << (len)) << (low))
   43.36 +#define	BF64_ENCODE(x, low, len)	(P2PHASE((x), 1ULL << (len)) << (low))
   43.37 +
   43.38 +#define	BF32_GET(x, low, len)		BF32_DECODE(x, low, len)
   43.39 +#define	BF64_GET(x, low, len)		BF64_DECODE(x, low, len)
   43.40 +
   43.41 +#define	BF32_SET(x, low, len, val)	\
   43.42 +	((x) ^= BF32_ENCODE((x >> low) ^ (val), low, len))
   43.43 +#define	BF64_SET(x, low, len, val)	\
   43.44 +	((x) ^= BF64_ENCODE((x >> low) ^ (val), low, len))
   43.45 +
   43.46 +#define	BF32_GET_SB(x, low, len, shift, bias)	\
   43.47 +	((BF32_GET(x, low, len) + (bias)) << (shift))
   43.48 +#define	BF64_GET_SB(x, low, len, shift, bias)	\
   43.49 +	((BF64_GET(x, low, len) + (bias)) << (shift))
   43.50 +
   43.51 +#define	BF32_SET_SB(x, low, len, shift, bias, val)	\
   43.52 +	BF32_SET(x, low, len, ((val) >> (shift)) - (bias))
   43.53 +#define	BF64_SET_SB(x, low, len, shift, bias, val)	\
   43.54 +	BF64_SET(x, low, len, ((val) >> (shift)) - (bias))
   43.55 +
   43.56 +/*
   43.57 + * We currently support nine block sizes, from 512 bytes to 128K.
   43.58 + * We could go higher, but the benefits are near-zero and the cost
   43.59 + * of COWing a giant block to modify one byte would become excessive.
   43.60 + */
   43.61 +#define	SPA_MINBLOCKSHIFT	9
   43.62 +#define	SPA_MAXBLOCKSHIFT	17
   43.63 +#define	SPA_MINBLOCKSIZE	(1ULL << SPA_MINBLOCKSHIFT)
   43.64 +#define	SPA_MAXBLOCKSIZE	(1ULL << SPA_MAXBLOCKSHIFT)
   43.65 +
   43.66 +#define	SPA_BLOCKSIZES		(SPA_MAXBLOCKSHIFT - SPA_MINBLOCKSHIFT + 1)
   43.67 +
   43.68 +/*
   43.69 + * The DVA size encodings for LSIZE and PSIZE support blocks up to 32MB.
   43.70 + * The ASIZE encoding should be at least 64 times larger (6 more bits)
   43.71 + * to support up to 4-way RAID-Z mirror mode with worst-case gang block
   43.72 + * overhead, three DVAs per bp, plus one more bit in case we do anything
   43.73 + * else that expands the ASIZE.
   43.74 + */
   43.75 +#define	SPA_LSIZEBITS		16	/* LSIZE up to 32M (2^16 * 512)	*/
   43.76 +#define	SPA_PSIZEBITS		16	/* PSIZE up to 32M (2^16 * 512)	*/
   43.77 +#define	SPA_ASIZEBITS		24	/* ASIZE up to 64 times larger	*/
   43.78 +
   43.79 +/*
   43.80 + * All SPA data is represented by 128-bit data virtual addresses (DVAs).
   43.81 + * The members of the dva_t should be considered opaque outside the SPA.
   43.82 + */
   43.83 +typedef struct dva {
   43.84 +	uint64_t	dva_word[2];
   43.85 +} dva_t;
   43.86 +
   43.87 +/*
   43.88 + * Each block has a 256-bit checksum -- strong enough for cryptographic hashes.
   43.89 + */
   43.90 +typedef struct zio_cksum {
   43.91 +	uint64_t	zc_word[4];
   43.92 +} zio_cksum_t;
   43.93 +
   43.94 +/*
   43.95 + * Each block is described by its DVAs, time of birth, checksum, etc.
   43.96 + * The word-by-word, bit-by-bit layout of the blkptr is as follows:
   43.97 + *
   43.98 + *	64	56	48	40	32	24	16	8	0
   43.99 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.100 + * 0	|		vdev1		| GRID  |	  ASIZE		|
  43.101 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.102 + * 1	|G|			 offset1				|
  43.103 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.104 + * 2	|		vdev2		| GRID  |	  ASIZE		|
  43.105 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.106 + * 3	|G|			 offset2				|
  43.107 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.108 + * 4	|		vdev3		| GRID  |	  ASIZE		|
  43.109 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.110 + * 5	|G|			 offset3				|
  43.111 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.112 + * 6	|E| lvl | type	| cksum | comp	|     PSIZE	|     LSIZE	|
  43.113 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.114 + * 7	|			padding					|
  43.115 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.116 + * 8	|			padding					|
  43.117 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.118 + * 9	|			padding					|
  43.119 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.120 + * a	|			birth txg				|
  43.121 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.122 + * b	|			fill count				|
  43.123 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.124 + * c	|			checksum[0]				|
  43.125 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.126 + * d	|			checksum[1]				|
  43.127 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.128 + * e	|			checksum[2]				|
  43.129 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.130 + * f	|			checksum[3]				|
  43.131 + *	+-------+-------+-------+-------+-------+-------+-------+-------+
  43.132 + *
  43.133 + * Legend:
  43.134 + *
  43.135 + * vdev		virtual device ID
  43.136 + * offset	offset into virtual device
  43.137 + * LSIZE	logical size
  43.138 + * PSIZE	physical size (after compression)
  43.139 + * ASIZE	allocated size (including RAID-Z parity and gang block headers)
  43.140 + * GRID		RAID-Z layout information (reserved for future use)
  43.141 + * cksum	checksum function
  43.142 + * comp		compression function
  43.143 + * G		gang block indicator
  43.144 + * E		endianness
  43.145 + * type		DMU object type
  43.146 + * lvl		level of indirection
  43.147 + * birth txg	transaction group in which the block was born
  43.148 + * fill count	number of non-zero blocks under this bp
  43.149 + * checksum[4]	256-bit checksum of the data this bp describes
  43.150 + */
  43.151 +typedef struct blkptr {
  43.152 +	dva_t		blk_dva[3];	/* 128-bit Data Virtual Address	*/
  43.153 +	uint64_t	blk_prop;	/* size, compression, type, etc	*/
  43.154 +	uint64_t	blk_pad[3];	/* Extra space for the future	*/
  43.155 +	uint64_t	blk_birth;	/* transaction group at birth	*/
  43.156 +	uint64_t	blk_fill;	/* fill count			*/
  43.157 +	zio_cksum_t	blk_cksum;	/* 256-bit checksum		*/
  43.158 +} blkptr_t;
  43.159 +
  43.160 +#define	SPA_BLKPTRSHIFT	7		/* blkptr_t is 128 bytes	*/
  43.161 +#define	SPA_DVAS_PER_BP	3		/* Number of DVAs in a bp	*/
  43.162 +
  43.163 +/*
  43.164 + * Macros to get and set fields in a bp or DVA.
  43.165 + */
  43.166 +#define	DVA_GET_ASIZE(dva)	\
  43.167 +	BF64_GET_SB((dva)->dva_word[0], 0, 24, SPA_MINBLOCKSHIFT, 0)
  43.168 +#define	DVA_SET_ASIZE(dva, x)	\
  43.169 +	BF64_SET_SB((dva)->dva_word[0], 0, 24, SPA_MINBLOCKSHIFT, 0, x)
  43.170 +
  43.171 +#define	DVA_GET_GRID(dva)	BF64_GET((dva)->dva_word[0], 24, 8)
  43.172 +#define	DVA_SET_GRID(dva, x)	BF64_SET((dva)->dva_word[0], 24, 8, x)
  43.173 +
  43.174 +#define	DVA_GET_VDEV(dva)	BF64_GET((dva)->dva_word[0], 32, 32)
  43.175 +#define	DVA_SET_VDEV(dva, x)	BF64_SET((dva)->dva_word[0], 32, 32, x)
  43.176 +
  43.177 +#define	DVA_GET_OFFSET(dva)	\
  43.178 +	BF64_GET_SB((dva)->dva_word[1], 0, 63, SPA_MINBLOCKSHIFT, 0)
  43.179 +#define	DVA_SET_OFFSET(dva, x)	\
  43.180 +	BF64_SET_SB((dva)->dva_word[1], 0, 63, SPA_MINBLOCKSHIFT, 0, x)
  43.181 +
  43.182 +#define	DVA_GET_GANG(dva)	BF64_GET((dva)->dva_word[1], 63, 1)
  43.183 +#define	DVA_SET_GANG(dva, x)	BF64_SET((dva)->dva_word[1], 63, 1, x)
  43.184 +
  43.185 +#define	BP_GET_LSIZE(bp)	\
  43.186 +	(BP_IS_HOLE(bp) ? 0 : \
  43.187 +	BF64_GET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1))
  43.188 +#define	BP_SET_LSIZE(bp, x)	\
  43.189 +	BF64_SET_SB((bp)->blk_prop, 0, 16, SPA_MINBLOCKSHIFT, 1, x)
  43.190 +
  43.191 +#define	BP_GET_PSIZE(bp)	\
  43.192 +	BF64_GET_SB((bp)->blk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1)
  43.193 +#define	BP_SET_PSIZE(bp, x)	\
  43.194 +	BF64_SET_SB((bp)->blk_prop, 16, 16, SPA_MINBLOCKSHIFT, 1, x)
  43.195 +
  43.196 +#define	BP_GET_COMPRESS(bp)	BF64_GET((bp)->blk_prop, 32, 8)
  43.197 +#define	BP_SET_COMPRESS(bp, x)	BF64_SET((bp)->blk_prop, 32, 8, x)
  43.198 +
  43.199 +#define	BP_GET_CHECKSUM(bp)	BF64_GET((bp)->blk_prop, 40, 8)
  43.200 +#define	BP_SET_CHECKSUM(bp, x)	BF64_SET((bp)->blk_prop, 40, 8, x)
  43.201 +
  43.202 +#define	BP_GET_TYPE(bp)		BF64_GET((bp)->blk_prop, 48, 8)
  43.203 +#define	BP_SET_TYPE(bp, x)	BF64_SET((bp)->blk_prop, 48, 8, x)
  43.204 +
  43.205 +#define	BP_GET_LEVEL(bp)	BF64_GET((bp)->blk_prop, 56, 5)
  43.206 +#define	BP_SET_LEVEL(bp, x)	BF64_SET((bp)->blk_prop, 56, 5, x)
  43.207 +
  43.208 +#define	BP_GET_BYTEORDER(bp)	(0 - BF64_GET((bp)->blk_prop, 63, 1))
  43.209 +#define	BP_SET_BYTEORDER(bp, x)	BF64_SET((bp)->blk_prop, 63, 1, x)
  43.210 +
  43.211 +#define	BP_GET_ASIZE(bp)	\
  43.212 +	(DVA_GET_ASIZE(&(bp)->blk_dva[0]) + DVA_GET_ASIZE(&(bp)->blk_dva[1]) + \
  43.213 +		DVA_GET_ASIZE(&(bp)->blk_dva[2]))
  43.214 +
  43.215 +#define	BP_GET_UCSIZE(bp) \
  43.216 +	((BP_GET_LEVEL(bp) > 0 || dmu_ot[BP_GET_TYPE(bp)].ot_metadata) ? \
  43.217 +	BP_GET_PSIZE(bp) : BP_GET_LSIZE(bp));
  43.218 +
  43.219 +#define	BP_GET_NDVAS(bp)	\
  43.220 +	(!!DVA_GET_ASIZE(&(bp)->blk_dva[0]) + \
  43.221 +	!!DVA_GET_ASIZE(&(bp)->blk_dva[1]) + \
  43.222 +	!!DVA_GET_ASIZE(&(bp)->blk_dva[2]))
  43.223 +
  43.224 +#define	BP_COUNT_GANG(bp)	\
  43.225 +	(DVA_GET_GANG(&(bp)->blk_dva[0]) + \
  43.226 +	DVA_GET_GANG(&(bp)->blk_dva[1]) + \
  43.227 +	DVA_GET_GANG(&(bp)->blk_dva[2]))
  43.228 +
  43.229 +#define	DVA_EQUAL(dva1, dva2)	\
  43.230 +	((dva1)->dva_word[1] == (dva2)->dva_word[1] && \
  43.231 +	(dva1)->dva_word[0] == (dva2)->dva_word[0])
  43.232 +
  43.233 +#define	ZIO_CHECKSUM_EQUAL(zc1, zc2) \
  43.234 +	(0 == (((zc1).zc_word[0] - (zc2).zc_word[0]) | \
  43.235 +	((zc1).zc_word[1] - (zc2).zc_word[1]) | \
  43.236 +	((zc1).zc_word[2] - (zc2).zc_word[2]) | \
  43.237 +	((zc1).zc_word[3] - (zc2).zc_word[3])))
  43.238 +
  43.239 +
  43.240 +#define	DVA_IS_VALID(dva)	(DVA_GET_ASIZE(dva) != 0)
  43.241 +
  43.242 +#define	ZIO_SET_CHECKSUM(zcp, w0, w1, w2, w3)	\
  43.243 +{						\
  43.244 +	(zcp)->zc_word[0] = w0;			\
  43.245 +	(zcp)->zc_word[1] = w1;			\
  43.246 +	(zcp)->zc_word[2] = w2;			\
  43.247 +	(zcp)->zc_word[3] = w3;			\
  43.248 +}
  43.249 +
  43.250 +#define	BP_IDENTITY(bp)		(&(bp)->blk_dva[0])
  43.251 +#define	BP_IS_GANG(bp)		DVA_GET_GANG(BP_IDENTITY(bp))
  43.252 +#define	BP_IS_HOLE(bp)		((bp)->blk_birth == 0)
  43.253 +#define	BP_IS_OLDER(bp, txg)	(!BP_IS_HOLE(bp) && (bp)->blk_birth < (txg))
  43.254 +
  43.255 +#define	BP_ZERO(bp)				\
  43.256 +{						\
  43.257 +	(bp)->blk_dva[0].dva_word[0] = 0;	\
  43.258 +	(bp)->blk_dva[0].dva_word[1] = 0;	\
  43.259 +	(bp)->blk_dva[1].dva_word[0] = 0;	\
  43.260 +	(bp)->blk_dva[1].dva_word[1] = 0;	\
  43.261 +	(bp)->blk_dva[2].dva_word[0] = 0;	\
  43.262 +	(bp)->blk_dva[2].dva_word[1] = 0;	\
  43.263 +	(bp)->blk_prop = 0;			\
  43.264 +	(bp)->blk_pad[0] = 0;			\
  43.265 +	(bp)->blk_pad[1] = 0;			\
  43.266 +	(bp)->blk_pad[2] = 0;			\
  43.267 +	(bp)->blk_birth = 0;			\
  43.268 +	(bp)->blk_fill = 0;			\
  43.269 +	ZIO_SET_CHECKSUM(&(bp)->blk_cksum, 0, 0, 0, 0);	\
  43.270 +}
  43.271 +
  43.272 +/*
  43.273 + * Note: the byteorder is either 0 or -1, both of which are palindromes.
  43.274 + * This simplifies the endianness handling a bit.
  43.275 + */
  43.276 +#ifdef _BIG_ENDIAN
  43.277 +#define	ZFS_HOST_BYTEORDER	(0ULL)
  43.278 +#else
  43.279 +#define	ZFS_HOST_BYTEORDER	(-1ULL)
  43.280 +#endif
  43.281 +
  43.282 +#define	BP_SHOULD_BYTESWAP(bp)	(BP_GET_BYTEORDER(bp) != ZFS_HOST_BYTEORDER)
  43.283 +
  43.284 +#define	BP_SPRINTF_LEN	320
  43.285 +
  43.286 +#endif	/* _SYS_SPA_H */
    44.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    44.2 +++ b/tools/libfsimage/zfs/zfs-include/uberblock_impl.h	Thu May 08 18:40:07 2008 +0900
    44.3 @@ -0,0 +1,49 @@
    44.4 +/*
    44.5 + *  GRUB  --  GRand Unified Bootloader
    44.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    44.7 + *
    44.8 + *  This program is free software; you can redistribute it and/or modify
    44.9 + *  it under the terms of the GNU General Public License as published by
   44.10 + *  the Free Software Foundation; either version 2 of the License, or
   44.11 + *  (at your option) any later version.
   44.12 + *
   44.13 + *  This program is distributed in the hope that it will be useful,
   44.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   44.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   44.16 + *  GNU General Public License for more details.
   44.17 + *
   44.18 + *  You should have received a copy of the GNU General Public License
   44.19 + *  along with this program; if not, write to the Free Software
   44.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   44.21 + */
   44.22 +/*
   44.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   44.24 + * Use is subject to license terms.
   44.25 + */
   44.26 +
   44.27 +#ifndef _SYS_UBERBLOCK_IMPL_H
   44.28 +#define	_SYS_UBERBLOCK_IMPL_H
   44.29 +
   44.30 +/*
   44.31 + * The uberblock version is incremented whenever an incompatible on-disk
   44.32 + * format change is made to the SPA, DMU, or ZAP.
   44.33 + *
   44.34 + * Note: the first two fields should never be moved.  When a storage pool
   44.35 + * is opened, the uberblock must be read off the disk before the version
   44.36 + * can be checked.  If the ub_version field is moved, we may not detect
   44.37 + * version mismatch.  If the ub_magic field is moved, applications that
   44.38 + * expect the magic number in the first word won't work.
   44.39 + */
   44.40 +#define	UBERBLOCK_MAGIC		0x00bab10c		/* oo-ba-bloc!	*/
   44.41 +#define	UBERBLOCK_SHIFT		10			/* up to 1K	*/
   44.42 +
   44.43 +struct uberblock {
   44.44 +	uint64_t	ub_magic;	/* UBERBLOCK_MAGIC		*/
   44.45 +	uint64_t	ub_version;	/* ZFS_VERSION			*/
   44.46 +	uint64_t	ub_txg;		/* txg of last sync		*/
   44.47 +	uint64_t	ub_guid_sum;	/* sum of all vdev guids	*/
   44.48 +	uint64_t	ub_timestamp;	/* UTC time of last sync	*/
   44.49 +	blkptr_t	ub_rootbp;	/* MOS objset_phys_t		*/
   44.50 +};
   44.51 +
   44.52 +#endif	/* _SYS_UBERBLOCK_IMPL_H */
    45.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    45.2 +++ b/tools/libfsimage/zfs/zfs-include/vdev_impl.h	Thu May 08 18:40:07 2008 +0900
    45.3 @@ -0,0 +1,70 @@
    45.4 +/*
    45.5 + *  GRUB  --  GRand Unified Bootloader
    45.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    45.7 + *
    45.8 + *  This program is free software; you can redistribute it and/or modify
    45.9 + *  it under the terms of the GNU General Public License as published by
   45.10 + *  the Free Software Foundation; either version 2 of the License, or
   45.11 + *  (at your option) any later version.
   45.12 + *
   45.13 + *  This program is distributed in the hope that it will be useful,
   45.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   45.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   45.16 + *  GNU General Public License for more details.
   45.17 + *
   45.18 + *  You should have received a copy of the GNU General Public License
   45.19 + *  along with this program; if not, write to the Free Software
   45.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   45.21 + */
   45.22 +/*
   45.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   45.24 + * Use is subject to license terms.
   45.25 + */
   45.26 +
   45.27 +#ifndef _SYS_VDEV_IMPL_H
   45.28 +#define	_SYS_VDEV_IMPL_H
   45.29 +
   45.30 +#define	VDEV_SKIP_SIZE		(8 << 10)
   45.31 +#define	VDEV_BOOT_HEADER_SIZE	(8 << 10)
   45.32 +#define	VDEV_PHYS_SIZE		(112 << 10)
   45.33 +#define	VDEV_UBERBLOCK_RING	(128 << 10)
   45.34 +
   45.35 +/* ZFS boot block */
   45.36 +#define	VDEV_BOOT_MAGIC		0x2f5b007b10cULL
   45.37 +#define	VDEV_BOOT_VERSION	1		/* version number	*/
   45.38 +
   45.39 +typedef struct vdev_boot_header {
   45.40 +	uint64_t	vb_magic;		/* VDEV_BOOT_MAGIC	*/
   45.41 +	uint64_t	vb_version;		/* VDEV_BOOT_VERSION	*/
   45.42 +	uint64_t	vb_offset;		/* start offset	(bytes) */
   45.43 +	uint64_t	vb_size;		/* size (bytes)		*/
   45.44 +	char		vb_pad[VDEV_BOOT_HEADER_SIZE - 4 * sizeof (uint64_t)];
   45.45 +} vdev_boot_header_t;
   45.46 +
   45.47 +typedef struct vdev_phys {
   45.48 +	char		vp_nvlist[VDEV_PHYS_SIZE - sizeof (zio_block_tail_t)];
   45.49 +	zio_block_tail_t vp_zbt;
   45.50 +} vdev_phys_t;
   45.51 +
   45.52 +typedef struct vdev_label {
   45.53 +	char		vl_pad[VDEV_SKIP_SIZE];			/*   8K	*/
   45.54 +	vdev_boot_header_t vl_boot_header;			/*   8K	*/
   45.55 +	vdev_phys_t	vl_vdev_phys;				/* 112K	*/
   45.56 +	char		vl_uberblock[VDEV_UBERBLOCK_RING];	/* 128K	*/
   45.57 +} vdev_label_t;							/* 256K total */
   45.58 +
   45.59 +/*
   45.60 + * Size and offset of embedded boot loader region on each label.
   45.61 + * The total size of the first two labels plus the boot area is 4MB.
   45.62 + */
   45.63 +#define	VDEV_BOOT_OFFSET	(2 * sizeof (vdev_label_t))
   45.64 +#define	VDEV_BOOT_SIZE		(7ULL << 19)			/* 3.5M	*/
   45.65 +
   45.66 +/*
   45.67 + * Size of label regions at the start and end of each leaf device.
   45.68 + */
   45.69 +#define	VDEV_LABEL_START_SIZE	(2 * sizeof (vdev_label_t) + VDEV_BOOT_SIZE)
   45.70 +#define	VDEV_LABEL_END_SIZE	(2 * sizeof (vdev_label_t))
   45.71 +#define	VDEV_LABELS		4
   45.72 +
   45.73 +#endif	/* _SYS_VDEV_IMPL_H */
    46.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    46.2 +++ b/tools/libfsimage/zfs/zfs-include/zap_impl.h	Thu May 08 18:40:07 2008 +0900
    46.3 @@ -0,0 +1,110 @@
    46.4 +/*
    46.5 + *  GRUB  --  GRand Unified Bootloader
    46.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    46.7 + *
    46.8 + *  This program is free software; you can redistribute it and/or modify
    46.9 + *  it under the terms of the GNU General Public License as published by
   46.10 + *  the Free Software Foundation; either version 2 of the License, or
   46.11 + *  (at your option) any later version.
   46.12 + *
   46.13 + *  This program is distributed in the hope that it will be useful,
   46.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   46.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   46.16 + *  GNU General Public License for more details.
   46.17 + *
   46.18 + *  You should have received a copy of the GNU General Public License
   46.19 + *  along with this program; if not, write to the Free Software
   46.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   46.21 + */
   46.22 +/*
   46.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   46.24 + * Use is subject to license terms.
   46.25 + */
   46.26 +
   46.27 +#ifndef	_SYS_ZAP_IMPL_H
   46.28 +#define	_SYS_ZAP_IMPL_H
   46.29 +
   46.30 +#define	ZAP_MAGIC 0x2F52AB2ABULL
   46.31 +
   46.32 +#define	ZAP_HASHBITS		28
   46.33 +#define	MZAP_ENT_LEN		64
   46.34 +#define	MZAP_NAME_LEN		(MZAP_ENT_LEN - 8 - 4 - 2)
   46.35 +#define	MZAP_MAX_BLKSHIFT	SPA_MAXBLOCKSHIFT
   46.36 +#define	MZAP_MAX_BLKSZ		(1 << MZAP_MAX_BLKSHIFT)
   46.37 +
   46.38 +typedef struct mzap_ent_phys {
   46.39 +	uint64_t mze_value;
   46.40 +	uint32_t mze_cd;
   46.41 +	uint16_t mze_pad;	/* in case we want to chain them someday */
   46.42 +	char mze_name[MZAP_NAME_LEN];
   46.43 +} mzap_ent_phys_t;
   46.44 +
   46.45 +typedef struct mzap_phys {
   46.46 +	uint64_t mz_block_type;	/* ZBT_MICRO */
   46.47 +	uint64_t mz_salt;
   46.48 +	uint64_t mz_pad[6];
   46.49 +	mzap_ent_phys_t mz_chunk[1];
   46.50 +	/* actually variable size depending on block size */
   46.51 +} mzap_phys_t;
   46.52 +
   46.53 +/*
   46.54 + * The (fat) zap is stored in one object. It is an array of
   46.55 + * 1<<FZAP_BLOCK_SHIFT byte blocks. The layout looks like one of:
   46.56 + *
   46.57 + * ptrtbl fits in first block:
   46.58 + * 	[zap_phys_t zap_ptrtbl_shift < 6] [zap_leaf_t] ...
   46.59 + *
   46.60 + * ptrtbl too big for first block:
   46.61 + * 	[zap_phys_t zap_ptrtbl_shift >= 6] [zap_leaf_t] [ptrtbl] ...
   46.62 + *
   46.63 + */
   46.64 +
   46.65 +#define	ZBT_LEAF		((1ULL << 63) + 0)
   46.66 +#define	ZBT_HEADER		((1ULL << 63) + 1)
   46.67 +#define	ZBT_MICRO		((1ULL << 63) + 3)
   46.68 +/* any other values are ptrtbl blocks */
   46.69 +
   46.70 +/*
   46.71 + * the embedded pointer table takes up half a block:
   46.72 + * block size / entry size (2^3) / 2
   46.73 + */
   46.74 +#define	ZAP_EMBEDDED_PTRTBL_SHIFT(zap) (FZAP_BLOCK_SHIFT(zap) - 3 - 1)
   46.75 +
   46.76 +/*
   46.77 + * The embedded pointer table starts half-way through the block.  Since
   46.78 + * the pointer table itself is half the block, it starts at (64-bit)
   46.79 + * word number (1<<ZAP_EMBEDDED_PTRTBL_SHIFT(zap)).
   46.80 + */
   46.81 +#define	ZAP_EMBEDDED_PTRTBL_ENT(zap, idx) \
   46.82 +	((uint64_t *)(zap)->zap_f.zap_phys) \
   46.83 +	[(idx) + (1<<ZAP_EMBEDDED_PTRTBL_SHIFT(zap))]
   46.84 +
   46.85 +/*
   46.86 + * TAKE NOTE:
   46.87 + * If zap_phys_t is modified, zap_byteswap() must be modified.
   46.88 + */
   46.89 +typedef struct zap_phys {
   46.90 +	uint64_t zap_block_type;	/* ZBT_HEADER */
   46.91 +	uint64_t zap_magic;		/* ZAP_MAGIC */
   46.92 +
   46.93 +	struct zap_table_phys {
   46.94 +		uint64_t zt_blk;	/* starting block number */
   46.95 +		uint64_t zt_numblks;	/* number of blocks */
   46.96 +		uint64_t zt_shift;	/* bits to index it */
   46.97 +		uint64_t zt_nextblk;	/* next (larger) copy start block */
   46.98 +		uint64_t zt_blks_copied; /* number source blocks copied */
   46.99 +	} zap_ptrtbl;
  46.100 +
  46.101 +	uint64_t zap_freeblk;		/* the next free block */
  46.102 +	uint64_t zap_num_leafs;		/* number of leafs */
  46.103 +	uint64_t zap_num_entries;	/* number of entries */
  46.104 +	uint64_t zap_salt;		/* salt to stir into hash function */
  46.105 +	/*
  46.106 +	 * This structure is followed by padding, and then the embedded
  46.107 +	 * pointer table.  The embedded pointer table takes up second
  46.108 +	 * half of the block.  It is accessed using the
  46.109 +	 * ZAP_EMBEDDED_PTRTBL_ENT() macro.
  46.110 +	 */
  46.111 +} zap_phys_t;
  46.112 +
  46.113 +#endif /* _SYS_ZAP_IMPL_H */
    47.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    47.2 +++ b/tools/libfsimage/zfs/zfs-include/zap_leaf.h	Thu May 08 18:40:07 2008 +0900
    47.3 @@ -0,0 +1,100 @@
    47.4 +/*
    47.5 + *  GRUB  --  GRand Unified Bootloader
    47.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    47.7 + *
    47.8 + *  This program is free software; you can redistribute it and/or modify
    47.9 + *  it under the terms of the GNU General Public License as published by
   47.10 + *  the Free Software Foundation; either version 2 of the License, or
   47.11 + *  (at your option) any later version.
   47.12 + *
   47.13 + *  This program is distributed in the hope that it will be useful,
   47.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   47.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   47.16 + *  GNU General Public License for more details.
   47.17 + *
   47.18 + *  You should have received a copy of the GNU General Public License
   47.19 + *  along with this program; if not, write to the Free Software
   47.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   47.21 + */
   47.22 +/*
   47.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   47.24 + * Use is subject to license terms.
   47.25 + */
   47.26 +
   47.27 +#ifndef	_SYS_ZAP_LEAF_H
   47.28 +#define	_SYS_ZAP_LEAF_H
   47.29 +
   47.30 +#define	ZAP_LEAF_MAGIC 0x2AB1EAF
   47.31 +
   47.32 +/* chunk size = 24 bytes */
   47.33 +#define	ZAP_LEAF_CHUNKSIZE 24
   47.34 +
   47.35 +/*
   47.36 + * The amount of space within the chunk available for the array is:
   47.37 + * chunk size - space for type (1) - space for next pointer (2)
   47.38 + */
   47.39 +#define	ZAP_LEAF_ARRAY_BYTES (ZAP_LEAF_CHUNKSIZE - 3)
   47.40 +
   47.41 +typedef enum zap_chunk_type {
   47.42 +	ZAP_CHUNK_FREE = 253,
   47.43 +	ZAP_CHUNK_ENTRY = 252,
   47.44 +	ZAP_CHUNK_ARRAY = 251,
   47.45 +	ZAP_CHUNK_TYPE_MAX = 250
   47.46 +} zap_chunk_type_t;
   47.47 +
   47.48 +/*
   47.49 + * TAKE NOTE:
   47.50 + * If zap_leaf_phys_t is modified, zap_leaf_byteswap() must be modified.
   47.51 + */
   47.52 +typedef struct zap_leaf_phys {
   47.53 +	struct zap_leaf_header {
   47.54 +		uint64_t lh_block_type;		/* ZBT_LEAF */
   47.55 +		uint64_t lh_pad1;
   47.56 +		uint64_t lh_prefix;		/* hash prefix of this leaf */
   47.57 +		uint32_t lh_magic;		/* ZAP_LEAF_MAGIC */
   47.58 +		uint16_t lh_nfree;		/* number free chunks */
   47.59 +		uint16_t lh_nentries;		/* number of entries */
   47.60 +		uint16_t lh_prefix_len;		/* num bits used to id this */
   47.61 +
   47.62 +/* above is accessable to zap, below is zap_leaf private */
   47.63 +
   47.64 +		uint16_t lh_freelist;		/* chunk head of free list */
   47.65 +		uint8_t lh_pad2[12];
   47.66 +	} l_hdr; /* 2 24-byte chunks */
   47.67 +
   47.68 +	/*
   47.69 +	 * The header is followed by a hash table with
   47.70 +	 * ZAP_LEAF_HASH_NUMENTRIES(zap) entries.  The hash table is
   47.71 +	 * followed by an array of ZAP_LEAF_NUMCHUNKS(zap)
   47.72 +	 * zap_leaf_chunk structures.  These structures are accessed
   47.73 +	 * with the ZAP_LEAF_CHUNK() macro.
   47.74 +	 */
   47.75 +
   47.76 +	uint16_t l_hash[1];
   47.77 +} zap_leaf_phys_t;
   47.78 +
   47.79 +typedef union zap_leaf_chunk {
   47.80 +	struct zap_leaf_entry {
   47.81 +		uint8_t le_type; 		/* always ZAP_CHUNK_ENTRY */
   47.82 +		uint8_t le_int_size;		/* size of ints */
   47.83 +		uint16_t le_next;		/* next entry in hash chain */
   47.84 +		uint16_t le_name_chunk;		/* first chunk of the name */
   47.85 +		uint16_t le_name_length;	/* bytes in name, incl null */
   47.86 +		uint16_t le_value_chunk;	/* first chunk of the value */
   47.87 +		uint16_t le_value_length;	/* value length in ints */
   47.88 +		uint32_t le_cd;			/* collision differentiator */
   47.89 +		uint64_t le_hash;		/* hash value of the name */
   47.90 +	} l_entry;
   47.91 +	struct zap_leaf_array {
   47.92 +		uint8_t la_type;		/* always ZAP_CHUNK_ARRAY */
   47.93 +		uint8_t la_array[ZAP_LEAF_ARRAY_BYTES];
   47.94 +		uint16_t la_next;		/* next blk or CHAIN_END */
   47.95 +	} l_array;
   47.96 +	struct zap_leaf_free {
   47.97 +		uint8_t lf_type;		/* always ZAP_CHUNK_FREE */
   47.98 +		uint8_t lf_pad[ZAP_LEAF_ARRAY_BYTES];
   47.99 +		uint16_t lf_next;	/* next in free list, or CHAIN_END */
  47.100 +	} l_free;
  47.101 +} zap_leaf_chunk_t;
  47.102 +
  47.103 +#endif /* _SYS_ZAP_LEAF_H */
    48.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    48.2 +++ b/tools/libfsimage/zfs/zfs-include/zfs.h	Thu May 08 18:40:07 2008 +0900
    48.3 @@ -0,0 +1,112 @@
    48.4 +/*
    48.5 + *  GRUB  --  GRand Unified Bootloader
    48.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    48.7 + *
    48.8 + *  This program is free software; you can redistribute it and/or modify
    48.9 + *  it under the terms of the GNU General Public License as published by
   48.10 + *  the Free Software Foundation; either version 2 of the License, or
   48.11 + *  (at your option) any later version.
   48.12 + *
   48.13 + *  This program is distributed in the hope that it will be useful,
   48.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   48.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   48.16 + *  GNU General Public License for more details.
   48.17 + *
   48.18 + *  You should have received a copy of the GNU General Public License
   48.19 + *  along with this program; if not, write to the Free Software
   48.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   48.21 + */
   48.22 +/*
   48.23 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
   48.24 + * Use is subject to license terms.
   48.25 + */
   48.26 +
   48.27 +#ifndef	_SYS_FS_ZFS_H
   48.28 +#define	_SYS_FS_ZFS_H
   48.29 +
   48.30 +
   48.31 +/*
   48.32 + * On-disk version number.
   48.33 + */
   48.34 +#define	SPA_VERSION_1			1ULL
   48.35 +#define	SPA_VERSION_2			2ULL
   48.36 +#define	SPA_VERSION_3			3ULL
   48.37 +#define	SPA_VERSION_4			4ULL
   48.38 +#define	SPA_VERSION_5			5ULL
   48.39 +#define	SPA_VERSION_6			6ULL
   48.40 +#define	SPA_VERSION_7			7ULL
   48.41 +#define	SPA_VERSION_8			8ULL
   48.42 +#define	SPA_VERSION_9			9ULL
   48.43 +#define	SPA_VERSION_10			10ULL
   48.44 +#define	SPA_VERSION			SPA_VERSION_10
   48.45 +
   48.46 +/*
   48.47 + * The following are configuration names used in the nvlist describing a pool's
   48.48 + * configuration.
   48.49 + */
   48.50 +#define	ZPOOL_CONFIG_VERSION		"version"
   48.51 +#define	ZPOOL_CONFIG_POOL_NAME		"name"
   48.52 +#define	ZPOOL_CONFIG_POOL_STATE		"state"
   48.53 +#define	ZPOOL_CONFIG_POOL_TXG		"txg"
   48.54 +#define	ZPOOL_CONFIG_POOL_GUID		"pool_guid"
   48.55 +#define	ZPOOL_CONFIG_CREATE_TXG		"create_txg"
   48.56 +#define	ZPOOL_CONFIG_TOP_GUID		"top_guid"
   48.57 +#define	ZPOOL_CONFIG_VDEV_TREE		"vdev_tree"
   48.58 +#define	ZPOOL_CONFIG_TYPE		"type"
   48.59 +#define	ZPOOL_CONFIG_CHILDREN		"children"
   48.60 +#define	ZPOOL_CONFIG_ID			"id"
   48.61 +#define	ZPOOL_CONFIG_GUID		"guid"
   48.62 +#define	ZPOOL_CONFIG_PATH		"path"
   48.63 +#define	ZPOOL_CONFIG_DEVID		"devid"
   48.64 +#define	ZPOOL_CONFIG_METASLAB_ARRAY	"metaslab_array"
   48.65 +#define	ZPOOL_CONFIG_METASLAB_SHIFT	"metaslab_shift"
   48.66 +#define	ZPOOL_CONFIG_ASHIFT		"ashift"
   48.67 +#define	ZPOOL_CONFIG_ASIZE		"asize"
   48.68 +#define	ZPOOL_CONFIG_DTL		"DTL"
   48.69 +#define	ZPOOL_CONFIG_STATS		"stats"
   48.70 +#define	ZPOOL_CONFIG_WHOLE_DISK		"whole_disk"
   48.71 +#define	ZPOOL_CONFIG_ERRCOUNT		"error_count"
   48.72 +#define	ZPOOL_CONFIG_NOT_PRESENT	"not_present"
   48.73 +#define	ZPOOL_CONFIG_SPARES		"spares"
   48.74 +#define	ZPOOL_CONFIG_IS_SPARE		"is_spare"
   48.75 +#define	ZPOOL_CONFIG_NPARITY		"nparity"
   48.76 +#define	ZPOOL_CONFIG_PHYS_PATH		"phys_path"
   48.77 +#define	ZPOOL_CONFIG_L2CACHE		"l2cache"
   48.78 +/*
   48.79 + * The persistent vdev state is stored as separate values rather than a single
   48.80 + * 'vdev_state' entry.  This is because a device can be in multiple states, such
   48.81 + * as offline and degraded.
   48.82 + */
   48.83 +#define	ZPOOL_CONFIG_OFFLINE		"offline"
   48.84 +#define	ZPOOL_CONFIG_FAULTED		"faulted"
   48.85 +#define	ZPOOL_CONFIG_DEGRADED		"degraded"
   48.86 +#define	ZPOOL_CONFIG_REMOVED		"removed"
   48.87 +
   48.88 +#define	VDEV_TYPE_ROOT			"root"
   48.89 +#define	VDEV_TYPE_MIRROR		"mirror"
   48.90 +#define	VDEV_TYPE_REPLACING		"replacing"
   48.91 +#define	VDEV_TYPE_RAIDZ			"raidz"
   48.92 +#define	VDEV_TYPE_DISK			"disk"
   48.93 +#define	VDEV_TYPE_FILE			"file"
   48.94 +#define	VDEV_TYPE_MISSING		"missing"
   48.95 +#define	VDEV_TYPE_SPARE			"spare"
   48.96 +#define	VDEV_TYPE_L2CACHE		"l2cache"
   48.97 +
   48.98 +/*
   48.99 + * pool state.  The following states are written to disk as part of the normal
  48.100 + * SPA lifecycle: ACTIVE, EXPORTED, DESTROYED, SPARE, L2CACHE.  The remaining
  48.101 + * states are software abstractions used at various levels to communicate pool
  48.102 + * state.
  48.103 + */
  48.104 +typedef enum pool_state {
  48.105 +	POOL_STATE_ACTIVE = 0,		/* In active use		*/
  48.106 +	POOL_STATE_EXPORTED,		/* Explicitly exported		*/
  48.107 +	POOL_STATE_DESTROYED,		/* Explicitly destroyed		*/
  48.108 +	POOL_STATE_SPARE,		/* Reserved for hot spare use	*/
  48.109 +	POOL_STATE_L2CACHE,		/* Level 2 ARC device		*/
  48.110 +	POOL_STATE_UNINITIALIZED,	/* Internal spa_t state		*/
  48.111 +	POOL_STATE_UNAVAIL,		/* Internal libzfs state	*/
  48.112 +	POOL_STATE_POTENTIALLY_ACTIVE	/* Internal libzfs state	*/
  48.113 +} pool_state_t;
  48.114 +
  48.115 +#endif	/* _SYS_FS_ZFS_H */
    49.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    49.2 +++ b/tools/libfsimage/zfs/zfs-include/zfs_acl.h	Thu May 08 18:40:07 2008 +0900
    49.3 @@ -0,0 +1,55 @@
    49.4 +/*
    49.5 + *  GRUB  --  GRand Unified Bootloader
    49.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    49.7 + *
    49.8 + *  This program is free software; you can redistribute it and/or modify
    49.9 + *  it under the terms of the GNU General Public License as published by
   49.10 + *  the Free Software Foundation; either version 2 of the License, or
   49.11 + *  (at your option) any later version.
   49.12 + *
   49.13 + *  This program is distributed in the hope that it will be useful,
   49.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   49.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   49.16 + *  GNU General Public License for more details.
   49.17 + *
   49.18 + *  You should have received a copy of the GNU General Public License
   49.19 + *  along with this program; if not, write to the Free Software
   49.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   49.21 + */
   49.22 +/*
   49.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   49.24 + * Use is subject to license terms.
   49.25 + */
   49.26 +
   49.27 +#ifndef	_SYS_FS_ZFS_ACL_H
   49.28 +#define	_SYS_FS_ZFS_ACL_H
   49.29 +
   49.30 +typedef struct zfs_oldace {
   49.31 +	uint32_t	z_fuid;		/* "who" */
   49.32 +	uint32_t	z_access_mask;  /* access mask */
   49.33 +	uint16_t	z_flags;	/* flags, i.e inheritance */
   49.34 +	uint16_t	z_type;		/* type of entry allow/deny */
   49.35 +} zfs_oldace_t;
   49.36 +
   49.37 +#define	ACE_SLOT_CNT	6
   49.38 +
   49.39 +typedef struct zfs_znode_acl_v0 {
   49.40 +	uint64_t	z_acl_extern_obj;	  /* ext acl pieces */
   49.41 +	uint32_t	z_acl_count;		  /* Number of ACEs */
   49.42 +	uint16_t	z_acl_version;		  /* acl version */
   49.43 +	uint16_t	z_acl_pad;		  /* pad */
   49.44 +	zfs_oldace_t	z_ace_data[ACE_SLOT_CNT]; /* 6 standard ACEs */
   49.45 +} zfs_znode_acl_v0_t;
   49.46 +
   49.47 +#define	ZFS_ACE_SPACE	(sizeof (zfs_oldace_t) * ACE_SLOT_CNT)
   49.48 +
   49.49 +typedef struct zfs_znode_acl {
   49.50 +	uint64_t	z_acl_extern_obj;	  /* ext acl pieces */
   49.51 +	uint32_t	z_acl_size;		  /* Number of bytes in ACL */
   49.52 +	uint16_t	z_acl_version;		  /* acl version */
   49.53 +	uint16_t	z_acl_count;		  /* ace count */
   49.54 +	uint8_t		z_ace_data[ZFS_ACE_SPACE]; /* space for embedded ACEs */
   49.55 +} zfs_znode_acl_t;
   49.56 +
   49.57 +
   49.58 +#endif	/* _SYS_FS_ZFS_ACL_H */
    50.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    50.2 +++ b/tools/libfsimage/zfs/zfs-include/zfs_znode.h	Thu May 08 18:40:07 2008 +0900
    50.3 @@ -0,0 +1,68 @@
    50.4 +/*
    50.5 + *  GRUB  --  GRand Unified Bootloader
    50.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    50.7 + *
    50.8 + *  This program is free software; you can redistribute it and/or modify
    50.9 + *  it under the terms of the GNU General Public License as published by
   50.10 + *  the Free Software Foundation; either version 2 of the License, or
   50.11 + *  (at your option) any later version.
   50.12 + *
   50.13 + *  This program is distributed in the hope that it will be useful,
   50.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   50.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   50.16 + *  GNU General Public License for more details.
   50.17 + *
   50.18 + *  You should have received a copy of the GNU General Public License
   50.19 + *  along with this program; if not, write to the Free Software
   50.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   50.21 + */
   50.22 +/*
   50.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   50.24 + * Use is subject to license terms.
   50.25 + */
   50.26 +
   50.27 +#ifndef	_SYS_FS_ZFS_ZNODE_H
   50.28 +#define	_SYS_FS_ZFS_ZNODE_H
   50.29 +
   50.30 +#define	MASTER_NODE_OBJ	1
   50.31 +#define	ZFS_ROOT_OBJ		"ROOT"
   50.32 +#define	ZPL_VERSION_STR		"VERSION"
   50.33 +
   50.34 +#define	ZPL_VERSION		3ULL
   50.35 +
   50.36 +#define	ZFS_DIRENT_OBJ(de) BF64_GET(de, 0, 48)
   50.37 +
   50.38 +/*
   50.39 + * This is the persistent portion of the znode.  It is stored
   50.40 + * in the "bonus buffer" of the file.  Short symbolic links
   50.41 + * are also stored in the bonus buffer.
   50.42 + */
   50.43 +typedef struct znode_phys {
   50.44 +	uint64_t zp_atime[2];		/*  0 - last file access time */
   50.45 +	uint64_t zp_mtime[2];		/* 16 - last file modification time */
   50.46 +	uint64_t zp_ctime[2];		/* 32 - last file change time */
   50.47 +	uint64_t zp_crtime[2];		/* 48 - creation time */
   50.48 +	uint64_t zp_gen;		/* 64 - generation (txg of creation) */
   50.49 +	uint64_t zp_mode;		/* 72 - file mode bits */
   50.50 +	uint64_t zp_size;		/* 80 - size of file */
   50.51 +	uint64_t zp_parent;		/* 88 - directory parent (`..') */
   50.52 +	uint64_t zp_links;		/* 96 - number of links to file */
   50.53 +	uint64_t zp_xattr;		/* 104 - DMU object for xattrs */
   50.54 +	uint64_t zp_rdev;		/* 112 - dev_t for VBLK & VCHR files */
   50.55 +	uint64_t zp_flags;		/* 120 - persistent flags */
   50.56 +	uint64_t zp_uid;		/* 128 - file owner */
   50.57 +	uint64_t zp_gid;		/* 136 - owning group */
   50.58 +	uint64_t zp_pad[4];		/* 144 - future */
   50.59 +	zfs_znode_acl_t zp_acl;		/* 176 - 263 ACL */
   50.60 +	/*
   50.61 +	 * Data may pad out any remaining bytes in the znode buffer, eg:
   50.62 +	 *
   50.63 +	 * |<---------------------- dnode_phys (512) ------------------------>|
   50.64 +	 * |<-- dnode (192) --->|<----------- "bonus" buffer (320) ---------->|
   50.65 +	 *			|<---- znode (264) ---->|<---- data (56) ---->|
   50.66 +	 *
   50.67 +	 * At present, we only use this space to store symbolic links.
   50.68 +	 */
   50.69 +} znode_phys_t;
   50.70 +
   50.71 +#endif	/* _SYS_FS_ZFS_ZNODE_H */
    51.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    51.2 +++ b/tools/libfsimage/zfs/zfs-include/zil.h	Thu May 08 18:40:07 2008 +0900
    51.3 @@ -0,0 +1,51 @@
    51.4 +/*
    51.5 + *  GRUB  --  GRand Unified Bootloader
    51.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    51.7 + *
    51.8 + *  This program is free software; you can redistribute it and/or modify
    51.9 + *  it under the terms of the GNU General Public License as published by
   51.10 + *  the Free Software Foundation; either version 2 of the License, or
   51.11 + *  (at your option) any later version.
   51.12 + *
   51.13 + *  This program is distributed in the hope that it will be useful,
   51.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   51.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   51.16 + *  GNU General Public License for more details.
   51.17 + *
   51.18 + *  You should have received a copy of the GNU General Public License
   51.19 + *  along with this program; if not, write to the Free Software
   51.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   51.21 + */
   51.22 +/*
   51.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   51.24 + * Use is subject to license terms.
   51.25 + */
   51.26 +
   51.27 +#ifndef	_SYS_ZIL_H
   51.28 +#define	_SYS_ZIL_H
   51.29 +
   51.30 +/*
   51.31 + * Intent log format:
   51.32 + *
   51.33 + * Each objset has its own intent log.  The log header (zil_header_t)
   51.34 + * for objset N's intent log is kept in the Nth object of the SPA's
   51.35 + * intent_log objset.  The log header points to a chain of log blocks,
   51.36 + * each of which contains log records (i.e., transactions) followed by
   51.37 + * a log block trailer (zil_trailer_t).  The format of a log record
   51.38 + * depends on the record (or transaction) type, but all records begin
   51.39 + * with a common structure that defines the type, length, and txg.
   51.40 + */
   51.41 +
   51.42 +/*
   51.43 + * Intent log header - this on disk structure holds fields to manage
   51.44 + * the log.  All fields are 64 bit to easily handle cross architectures.
   51.45 + */
   51.46 +typedef struct zil_header {
   51.47 +	uint64_t zh_claim_txg;	/* txg in which log blocks were claimed */
   51.48 +	uint64_t zh_replay_seq;	/* highest replayed sequence number */
   51.49 +	blkptr_t zh_log;	/* log chain */
   51.50 +	uint64_t zh_claim_seq;	/* highest claimed sequence number */
   51.51 +	uint64_t zh_pad[5];
   51.52 +} zil_header_t;
   51.53 +
   51.54 +#endif	/* _SYS_ZIL_H */
    52.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    52.2 +++ b/tools/libfsimage/zfs/zfs-include/zio.h	Thu May 08 18:40:07 2008 +0900
    52.3 @@ -0,0 +1,81 @@
    52.4 +/*
    52.5 + *  GRUB  --  GRand Unified Bootloader
    52.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    52.7 + *
    52.8 + *  This program is free software; you can redistribute it and/or modify
    52.9 + *  it under the terms of the GNU General Public License as published by
   52.10 + *  the Free Software Foundation; either version 2 of the License, or
   52.11 + *  (at your option) any later version.
   52.12 + *
   52.13 + *  This program is distributed in the hope that it will be useful,
   52.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   52.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   52.16 + *  GNU General Public License for more details.
   52.17 + *
   52.18 + *  You should have received a copy of the GNU General Public License
   52.19 + *  along with this program; if not, write to the Free Software
   52.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   52.21 + */
   52.22 +/*
   52.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   52.24 + * Use is subject to license terms.
   52.25 + */
   52.26 +
   52.27 +#ifndef _ZIO_H
   52.28 +#define	_ZIO_H
   52.29 +
   52.30 +#define	ZBT_MAGIC	0x210da7ab10c7a11ULL	/* zio data bloc tail */
   52.31 +
   52.32 +typedef struct zio_block_tail {
   52.33 +	uint64_t	zbt_magic;	/* for validation, endianness	*/
   52.34 +	zio_cksum_t	zbt_cksum;	/* 256-bit checksum		*/
   52.35 +} zio_block_tail_t;
   52.36 +
   52.37 +/*
   52.38 + * Gang block headers are self-checksumming and contain an array
   52.39 + * of block pointers.
   52.40 + */
   52.41 +#define	SPA_GANGBLOCKSIZE	SPA_MINBLOCKSIZE
   52.42 +#define	SPA_GBH_NBLKPTRS	((SPA_GANGBLOCKSIZE - \
   52.43 +	sizeof (zio_block_tail_t)) / sizeof (blkptr_t))
   52.44 +#define	SPA_GBH_FILLER		((SPA_GANGBLOCKSIZE - \
   52.45 +	sizeof (zio_block_tail_t) - \
   52.46 +	(SPA_GBH_NBLKPTRS * sizeof (blkptr_t))) /\
   52.47 +	sizeof (uint64_t))
   52.48 +
   52.49 +#define	ZIO_GET_IOSIZE(zio)	\
   52.50 +	(BP_IS_GANG((zio)->io_bp) ? \
   52.51 +	SPA_GANGBLOCKSIZE : BP_GET_PSIZE((zio)->io_bp))
   52.52 +
   52.53 +typedef struct zio_gbh {
   52.54 +	blkptr_t		zg_blkptr[SPA_GBH_NBLKPTRS];
   52.55 +	uint64_t		zg_filler[SPA_GBH_FILLER];
   52.56 +	zio_block_tail_t	zg_tail;
   52.57 +} zio_gbh_phys_t;
   52.58 +
   52.59 +enum zio_checksum {
   52.60 +	ZIO_CHECKSUM_INHERIT = 0,
   52.61 +	ZIO_CHECKSUM_ON,
   52.62 +	ZIO_CHECKSUM_OFF,
   52.63 +	ZIO_CHECKSUM_LABEL,
   52.64 +	ZIO_CHECKSUM_GANG_HEADER,
   52.65 +	ZIO_CHECKSUM_ZILOG,
   52.66 +	ZIO_CHECKSUM_FLETCHER_2,
   52.67 +	ZIO_CHECKSUM_FLETCHER_4,
   52.68 +	ZIO_CHECKSUM_SHA256,
   52.69 +	ZIO_CHECKSUM_FUNCTIONS
   52.70 +};
   52.71 +
   52.72 +#define	ZIO_CHECKSUM_ON_VALUE	ZIO_CHECKSUM_FLETCHER_2
   52.73 +#define	ZIO_CHECKSUM_DEFAULT	ZIO_CHECKSUM_ON
   52.74 +
   52.75 +enum zio_compress {
   52.76 +	ZIO_COMPRESS_INHERIT = 0,
   52.77 +	ZIO_COMPRESS_ON,
   52.78 +	ZIO_COMPRESS_OFF,
   52.79 +	ZIO_COMPRESS_LZJB,
   52.80 +	ZIO_COMPRESS_EMPTY,
   52.81 +	ZIO_COMPRESS_FUNCTIONS
   52.82 +};
   52.83 +
   52.84 +#endif	/* _ZIO_H */
    53.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    53.2 +++ b/tools/libfsimage/zfs/zfs-include/zio_checksum.h	Thu May 08 18:40:07 2008 +0900
    53.3 @@ -0,0 +1,42 @@
    53.4 +/*
    53.5 + *  GRUB  --  GRand Unified Bootloader
    53.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    53.7 + *
    53.8 + *  This program is free software; you can redistribute it and/or modify
    53.9 + *  it under the terms of the GNU General Public License as published by
   53.10 + *  the Free Software Foundation; either version 2 of the License, or
   53.11 + *  (at your option) any later version.
   53.12 + *
   53.13 + *  This program is distributed in the hope that it will be useful,
   53.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   53.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   53.16 + *  GNU General Public License for more details.
   53.17 + *
   53.18 + *  You should have received a copy of the GNU General Public License
   53.19 + *  along with this program; if not, write to the Free Software
   53.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   53.21 + */
   53.22 +/*
   53.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   53.24 + * Use is subject to license terms.
   53.25 + */
   53.26 +
   53.27 +#ifndef _SYS_ZIO_CHECKSUM_H
   53.28 +#define	_SYS_ZIO_CHECKSUM_H
   53.29 +
   53.30 +/*
   53.31 + * Signature for checksum functions.
   53.32 + */
   53.33 +typedef void zio_checksum_t(const void *data, uint64_t size, zio_cksum_t *zcp);
   53.34 +
   53.35 +/*
   53.36 + * Information about each checksum function.
   53.37 + */
   53.38 +typedef struct zio_checksum_info {
   53.39 +	zio_checksum_t	*ci_func[2]; /* checksum function for each byteorder */
   53.40 +	int		ci_correctable;	/* number of correctable bits	*/
   53.41 +	int		ci_zbt;		/* uses zio block tail?	*/
   53.42 +	char		*ci_name;	/* descriptive name */
   53.43 +} zio_checksum_info_t;
   53.44 +
   53.45 +#endif	/* _SYS_ZIO_CHECKSUM_H */
    54.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    54.2 +++ b/tools/libfsimage/zfs/zfs_fletcher.c	Thu May 08 18:40:07 2008 +0900
    54.3 @@ -0,0 +1,93 @@
    54.4 +/*
    54.5 + *  GRUB  --  GRand Unified Bootloader
    54.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    54.7 + *
    54.8 + *  This program is free software; you can redistribute it and/or modify
    54.9 + *  it under the terms of the GNU General Public License as published by
   54.10 + *  the Free Software Foundation; either version 2 of the License, or
   54.11 + *  (at your option) any later version.
   54.12 + *
   54.13 + *  This program is distributed in the hope that it will be useful,
   54.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   54.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   54.16 + *  GNU General Public License for more details.
   54.17 + *
   54.18 + *  You should have received a copy of the GNU General Public License
   54.19 + *  along with this program; if not, write to the Free Software
   54.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   54.21 + */
   54.22 +/*
   54.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   54.24 + * Use is subject to license terms.
   54.25 + */
   54.26 +
   54.27 +#include "fsys_zfs.h"
   54.28 +
   54.29 +
   54.30 +void
   54.31 +fletcher_2_native(const void *buf, uint64_t size, zio_cksum_t *zcp)
   54.32 +{
   54.33 +	const uint64_t *ip = buf;
   54.34 +	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
   54.35 +	uint64_t a0, b0, a1, b1;
   54.36 +
   54.37 +	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
   54.38 +		a0 += ip[0];
   54.39 +		a1 += ip[1];
   54.40 +		b0 += a0;
   54.41 +		b1 += a1;
   54.42 +	}
   54.43 +
   54.44 +	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
   54.45 +}
   54.46 +
   54.47 +void
   54.48 +fletcher_2_byteswap(const void *buf, uint64_t size, zio_cksum_t *zcp)
   54.49 +{
   54.50 +	const uint64_t *ip = buf;
   54.51 +	const uint64_t *ipend = ip + (size / sizeof (uint64_t));
   54.52 +	uint64_t a0, b0, a1, b1;
   54.53 +
   54.54 +	for (a0 = b0 = a1 = b1 = 0; ip < ipend; ip += 2) {
   54.55 +		a0 += BSWAP_64(ip[0]);
   54.56 +		a1 += BSWAP_64(ip[1]);
   54.57 +		b0 += a0;
   54.58 +		b1 += a1;
   54.59 +	}
   54.60 +
   54.61 +	ZIO_SET_CHECKSUM(zcp, a0, a1, b0, b1);
   54.62 +}
   54.63 +
   54.64 +void
   54.65 +fletcher_4_native(const void *buf, uint64_t size, zio_cksum_t *zcp)
   54.66 +{
   54.67 +	const uint32_t *ip = buf;
   54.68 +	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
   54.69 +	uint64_t a, b, c, d;
   54.70 +
   54.71 +	for (a = b = c = d = 0; ip < ipend; ip++) {
   54.72 +		a += ip[0];
   54.73 +		b += a;
   54.74 +		c += b;
   54.75 +		d += c;
   54.76 +	}
   54.77 +
   54.78 +	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
   54.79 +}
   54.80 +
   54.81 +void
   54.82 +fletcher_4_byteswap(const void *buf, uint64_t size, zio_cksum_t *zcp)
   54.83 +{
   54.84 +	const uint32_t *ip = buf;
   54.85 +	const uint32_t *ipend = ip + (size / sizeof (uint32_t));
   54.86 +	uint64_t a, b, c, d;
   54.87 +
   54.88 +	for (a = b = c = d = 0; ip < ipend; ip++) {
   54.89 +		a += BSWAP_32(ip[0]);
   54.90 +		b += a;
   54.91 +		c += b;
   54.92 +		d += c;
   54.93 +	}
   54.94 +
   54.95 +	ZIO_SET_CHECKSUM(zcp, a, b, c, d);
   54.96 +}
    55.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    55.2 +++ b/tools/libfsimage/zfs/zfs_lzjb.c	Thu May 08 18:40:07 2008 +0900
    55.3 @@ -0,0 +1,60 @@
    55.4 +/*
    55.5 + *  GRUB  --  GRand Unified Bootloader
    55.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    55.7 + *
    55.8 + *  This program is free software; you can redistribute it and/or modify
    55.9 + *  it under the terms of the GNU General Public License as published by
   55.10 + *  the Free Software Foundation; either version 2 of the License, or
   55.11 + *  (at your option) any later version.
   55.12 + *
   55.13 + *  This program is distributed in the hope that it will be useful,
   55.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   55.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   55.16 + *  GNU General Public License for more details.
   55.17 + *
   55.18 + *  You should have received a copy of the GNU General Public License
   55.19 + *  along with this program; if not, write to the Free Software
   55.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   55.21 + */
   55.22 +/*
   55.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   55.24 + * Use is subject to license terms.
   55.25 + */
   55.26 +
   55.27 +#include "fsys_zfs.h"
   55.28 +
   55.29 +#define	MATCH_BITS	6
   55.30 +#define	MATCH_MIN	3
   55.31 +#define	OFFSET_MASK	((1 << (16 - MATCH_BITS)) - 1)
   55.32 +
   55.33 +
   55.34 +/*ARGSUSED*/
   55.35 +int
   55.36 +lzjb_decompress(void *s_start, void *d_start, size_t s_len, size_t d_len)
   55.37 +{
   55.38 +	unsigned char *src = s_start;
   55.39 +	unsigned char *dst = d_start;
   55.40 +	unsigned char *d_end = (unsigned char *)d_start + d_len;
   55.41 +	unsigned char *cpy;
   55.42 +	unsigned char copymap = '\0';
   55.43 +	int copymask = 1 << (NBBY - 1);
   55.44 +
   55.45 +	while (dst < d_end) {
   55.46 +		if ((copymask <<= 1) == (1 << NBBY)) {
   55.47 +			copymask = 1;
   55.48 +			copymap = *src++;
   55.49 +		}
   55.50 +		if (copymap & (unsigned char)copymask) {
   55.51 +			int mlen = (src[0] >> (NBBY - MATCH_BITS)) + MATCH_MIN;
   55.52 +			int offset = ((src[0] << NBBY) | src[1]) & OFFSET_MASK;
   55.53 +			src += 2;
   55.54 +			if ((cpy = dst - offset) < (unsigned char *)d_start)
   55.55 +				return (-1);
   55.56 +			while (--mlen >= 0 && dst < d_end)
   55.57 +				*dst++ = *cpy++;
   55.58 +		} else {
   55.59 +			*dst++ = *src++;
   55.60 +		}
   55.61 +	}
   55.62 +	return (0);
   55.63 +}
    56.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    56.2 +++ b/tools/libfsimage/zfs/zfs_sha256.c	Thu May 08 18:40:07 2008 +0900
    56.3 @@ -0,0 +1,124 @@
    56.4 +/*
    56.5 + *  GRUB  --  GRand Unified Bootloader
    56.6 + *  Copyright (C) 1999,2000,2001,2002,2003,2004  Free Software Foundation, Inc.
    56.7 + *
    56.8 + *  This program is free software; you can redistribute it and/or modify
    56.9 + *  it under the terms of the GNU General Public License as published by
   56.10 + *  the Free Software Foundation; either version 2 of the License, or
   56.11 + *  (at your option) any later version.
   56.12 + *
   56.13 + *  This program is distributed in the hope that it will be useful,
   56.14 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
   56.15 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   56.16 + *  GNU General Public License for more details.
   56.17 + *
   56.18 + *  You should have received a copy of the GNU General Public License
   56.19 + *  along with this program; if not, write to the Free Software
   56.20 + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   56.21 + */
   56.22 +/*
   56.23 + * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
   56.24 + * Use is subject to license terms.
   56.25 + */
   56.26 +
   56.27 +#include "fsys_zfs.h"
   56.28 +
   56.29 +/*
   56.30 + * SHA-256 checksum, as specified in FIPS 180-2, available at:
   56.31 + * http://csrc.nist.gov/cryptval
   56.32 + *
   56.33 + * This is a very compact implementation of SHA-256.
   56.34 + * It is designed to be simple and portable, not to be fast.
   56.35 + */
   56.36 +
   56.37 +/*
   56.38 + * The literal definitions according to FIPS180-2 would be:
   56.39 + *
   56.40 + * 	Ch(x, y, z)     (((x) & (y)) ^ ((~(x)) & (z)))
   56.41 + * 	Maj(x, y, z)    (((x) & (y)) | ((x) & (z)) | ((y) & (z)))
   56.42 + *
   56.43 + * We use logical equivalents which require one less op.
   56.44 + */
   56.45 +#define	Ch(x, y, z)	((z) ^ ((x) & ((y) ^ (z))))
   56.46 +#define	Maj(x, y, z)	(((x) & (y)) ^ ((z) & ((x) ^ (y))))
   56.47 +#define	Rot32(x, s)	(((x) >> s) | ((x) << (32 - s)))
   56.48 +#define	SIGMA0(x)	(Rot32(x, 2) ^ Rot32(x, 13) ^ Rot32(x, 22))
   56.49 +#define	SIGMA1(x)	(Rot32(x, 6) ^ Rot32(x, 11) ^ Rot32(x, 25))
   56.50 +#define	sigma0(x)	(Rot32(x, 7) ^ Rot32(x, 18) ^ ((x) >> 3))
   56.51 +#define	sigma1(x)	(Rot32(x, 17) ^ Rot32(x, 19) ^ ((x) >> 10))
   56.52 +
   56.53 +static const uint32_t SHA256_K[64] = {
   56.54 +	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5,
   56.55 +	0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
   56.56 +	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3,
   56.57 +	0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
   56.58 +	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc,
   56.59 +	0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
   56.60 +	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7,
   56.61 +	0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
   56.62 +	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13,
   56.63 +	0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
   56.64 +	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3,
   56.65 +	0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
   56.66 +	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5,
   56.67 +	0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
   56.68 +	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208,
   56.69 +	0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
   56.70 +};
   56.71 +
   56.72 +static void
   56.73 +SHA256Transform(uint32_t *H, const uint8_t *cp)
   56.74 +{
   56.75 +	uint32_t a, b, c, d, e, f, g, h, t, T1, T2, W[64];
   56.76 +
   56.77 +	for (t = 0; t < 16; t++, cp += 4)
   56.78 +		W[t] = (cp[0] << 24) | (cp[1] << 16) | (cp[2] << 8) | cp[3];
   56.79 +
   56.80 +	for (t = 16; t < 64; t++)
   56.81 +		W[t] = sigma1(W[t - 2]) + W[t - 7] +
   56.82 +		    sigma0(W[t - 15]) + W[t - 16];
   56.83 +
   56.84 +	a = H[0]; b = H[1]; c = H[2]; d = H[3];
   56.85 +	e = H[4]; f = H[5]; g = H[6]; h = H[7];
   56.86 +
   56.87 +	for (t = 0; t < 64; t++) {
   56.88 +		T1 = h + SIGMA1(e) + Ch(e, f, g) + SHA256_K[t] + W[t];
   56.89 +		T2 = SIGMA0(a) + Maj(a, b, c);
   56.90 +		h = g; g = f; f = e; e = d + T1;
   56.91 +		d = c; c = b; b = a; a = T1 + T2;
   56.92 +	}
   56.93 +
   56.94 +	H[0] += a; H[1] += b; H[2] += c; H[3] += d;
   56.95 +	H[4] += e; H[5] += f; H[6] += g; H[7] += h;
   56.96 +}
   56.97 +
   56.98 +void
   56.99 +zio_checksum_SHA256(const void *buf, uint64_t size, zio_cksum_t *zcp)
  56.100 +{
  56.101 +	uint32_t H[8] = { 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
  56.102 +	    0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 };
  56.103 +	uint8_t pad[128];
  56.104 +	int padsize = size & 63;
  56.105 +	int i;
  56.106 +
  56.107 +	for (i = 0; i < size - padsize; i += 64)
  56.108 +		SHA256Transform(H, (uint8_t *)buf + i);
  56.109 +
  56.110 +	for (i = 0; i < padsize; i++)
  56.111 +		pad[i] = ((uint8_t *)buf)[i];
  56.112 +
  56.113 +	for (pad[padsize++] = 0x80; (padsize & 63) != 56; padsize++)
  56.114 +		pad[padsize] = 0;
  56.115 +
  56.116 +	for (i = 0; i < 8; i++)
  56.117 +		pad[padsize++] = (size << 3) >> (56 - 8 * i);
  56.118 +
  56.119 +	for (i = 0; i < padsize; i += 64)
  56.120 +		SHA256Transform(H, pad + i);
  56.121 +
  56.122 +	ZIO_SET_CHECKSUM(zcp,
  56.123 +	    (uint64_t)H[0] << 32 | H[1],
  56.124 +	    (uint64_t)H[2] << 32 | H[3],
  56.125 +	    (uint64_t)H[4] << 32 | H[5],
  56.126 +	    (uint64_t)H[6] << 32 | H[7]);
  56.127 +}
    57.1 --- a/tools/libxc/Makefile	Fri Apr 25 20:13:52 2008 +0900
    57.2 +++ b/tools/libxc/Makefile	Thu May 08 18:40:07 2008 +0900
    57.3 @@ -53,6 +53,7 @@ GUEST_SRCS-y                 += xc_dom_b
    57.4  GUEST_SRCS-y                 += xc_dom_compat_linux.c
    57.5  
    57.6  GUEST_SRCS-$(CONFIG_X86)     += xc_dom_x86.c
    57.7 +GUEST_SRCS-$(CONFIG_X86)     += xc_cpuid_x86.c
    57.8  GUEST_SRCS-$(CONFIG_IA64)    += xc_dom_ia64.c
    57.9  GUEST_SRCS-$(CONFIG_POWERPC) += xc_dom_powerpc.c
   57.10  endif
    58.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    58.2 +++ b/tools/libxc/xc_cpufeature.h	Thu May 08 18:40:07 2008 +0900
    58.3 @@ -0,0 +1,115 @@
    58.4 +#ifndef __LIBXC_CPUFEATURE_H
    58.5 +#define __LIBXC_CPUFEATURE_H
    58.6 +
    58.7 +/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */
    58.8 +#define X86_FEATURE_FPU		(0*32+ 0) /* Onboard FPU */
    58.9 +#define X86_FEATURE_VME		(0*32+ 1) /* Virtual Mode Extensions */
   58.10 +#define X86_FEATURE_DE		(0*32+ 2) /* Debugging Extensions */
   58.11 +#define X86_FEATURE_PSE 	(0*32+ 3) /* Page Size Extensions */
   58.12 +#define X86_FEATURE_TSC		(0*32+ 4) /* Time Stamp Counter */
   58.13 +#define X86_FEATURE_MSR		(0*32+ 5) /* Model-Specific Registers, RDMSR, WRMSR */
   58.14 +#define X86_FEATURE_PAE		(0*32+ 6) /* Physical Address Extensions */
   58.15 +#define X86_FEATURE_MCE		(0*32+ 7) /* Machine Check Architecture */
   58.16 +#define X86_FEATURE_CX8		(0*32+ 8) /* CMPXCHG8 instruction */
   58.17 +#define X86_FEATURE_APIC	(0*32+ 9) /* Onboard APIC */
   58.18 +#define X86_FEATURE_SEP		(0*32+11) /* SYSENTER/SYSEXIT */
   58.19 +#define X86_FEATURE_MTRR	(0*32+12) /* Memory Type Range Registers */
   58.20 +#define X86_FEATURE_PGE		(0*32+13) /* Page Global Enable */
   58.21 +#define X86_FEATURE_MCA		(0*32+14) /* Machine Check Architecture */
   58.22 +#define X86_FEATURE_CMOV	(0*32+15) /* CMOV instruction (FCMOVCC and FCOMI too if FPU present) */
   58.23 +#define X86_FEATURE_PAT		(0*32+16) /* Page Attribute Table */
   58.24 +#define X86_FEATURE_PSE36	(0*32+17) /* 36-bit PSEs */
   58.25 +#define X86_FEATURE_PN		(0*32+18) /* Processor serial number */
   58.26 +#define X86_FEATURE_CLFLSH	(0*32+19) /* Supports the CLFLUSH instruction */
   58.27 +#define X86_FEATURE_DS		(0*32+21) /* Debug Store */
   58.28 +#define X86_FEATURE_ACPI	(0*32+22) /* ACPI via MSR */
   58.29 +#define X86_FEATURE_MMX		(0*32+23) /* Multimedia Extensions */
   58.30 +#define X86_FEATURE_FXSR	(0*32+24) /* FXSAVE and FXRSTOR instructions (fast save and restore */
   58.31 +				          /* of FPU context), and CR4.OSFXSR available */
   58.32 +#define X86_FEATURE_XMM		(0*32+25) /* Streaming SIMD Extensions */
   58.33 +#define X86_FEATURE_XMM2	(0*32+26) /* Streaming SIMD Extensions-2 */
   58.34 +#define X86_FEATURE_SELFSNOOP	(0*32+27) /* CPU self snoop */
   58.35 +#define X86_FEATURE_HT		(0*32+28) /* Hyper-Threading */
   58.36 +#define X86_FEATURE_ACC		(0*32+29) /* Automatic clock control */
   58.37 +#define X86_FEATURE_IA64	(0*32+30) /* IA-64 processor */
   58.38 +#define X86_FEATURE_PBE		(0*32+31) /* Pending Break Enable */
   58.39 +
   58.40 +/* AMD-defined CPU features, CPUID level 0x80000001, word 1 */
   58.41 +/* Don't duplicate feature flags which are redundant with Intel! */
   58.42 +#define X86_FEATURE_SYSCALL	(1*32+11) /* SYSCALL/SYSRET */
   58.43 +#define X86_FEATURE_MP		(1*32+19) /* MP Capable. */
   58.44 +#define X86_FEATURE_NX		(1*32+20) /* Execute Disable */
   58.45 +#define X86_FEATURE_MMXEXT	(1*32+22) /* AMD MMX extensions */
   58.46 +#define X86_FEATURE_FFXSR       (1*32+25) /* FFXSR instruction optimizations */
   58.47 +#define X86_FEATURE_PAGE1GB	(1*32+26) /* 1Gb large page support */
   58.48 +#define X86_FEATURE_RDTSCP	(1*32+27) /* RDTSCP */
   58.49 +#define X86_FEATURE_LM		(1*32+29) /* Long Mode (x86-64) */
   58.50 +#define X86_FEATURE_3DNOWEXT	(1*32+30) /* AMD 3DNow! extensions */
   58.51 +#define X86_FEATURE_3DNOW	(1*32+31) /* 3DNow! */
   58.52 +
   58.53 +/* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */
   58.54 +#define X86_FEATURE_RECOVERY	(2*32+ 0) /* CPU in recovery mode */
   58.55 +#define X86_FEATURE_LONGRUN	(2*32+ 1) /* Longrun power control */
   58.56 +#define X86_FEATURE_LRTI	(2*32+ 3) /* LongRun table interface */
   58.57 +
   58.58 +/* Other features, Linux-defined mapping, word 3 */
   58.59 +/* This range is used for feature bits which conflict or are synthesized */
   58.60 +#define X86_FEATURE_CXMMX	(3*32+ 0) /* Cyrix MMX extensions */
   58.61 +#define X86_FEATURE_K6_MTRR	(3*32+ 1) /* AMD K6 nonstandard MTRRs */
   58.62 +#define X86_FEATURE_CYRIX_ARR	(3*32+ 2) /* Cyrix ARRs (= MTRRs) */
   58.63 +#define X86_FEATURE_CENTAUR_MCR	(3*32+ 3) /* Centaur MCRs (= MTRRs) */
   58.64 +/* cpu types for specific tunings: */
   58.65 +#define X86_FEATURE_K8		(3*32+ 4) /* Opteron, Athlon64 */
   58.66 +#define X86_FEATURE_K7		(3*32+ 5) /* Athlon */
   58.67 +#define X86_FEATURE_P3		(3*32+ 6) /* P3 */
   58.68 +#define X86_FEATURE_P4		(3*32+ 7) /* P4 */
   58.69 +#define X86_FEATURE_CONSTANT_TSC (3*32+ 8) /* TSC ticks at a constant rate */
   58.70 +
   58.71 +/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
   58.72 +#define X86_FEATURE_XMM3	(4*32+ 0) /* Streaming SIMD Extensions-3 */
   58.73 +#define X86_FEATURE_DTES64	(4*32+ 2) /* 64-bit Debug Store */
   58.74 +#define X86_FEATURE_MWAIT	(4*32+ 3) /* Monitor/Mwait support */
   58.75 +#define X86_FEATURE_DSCPL	(4*32+ 4) /* CPL Qualified Debug Store */
   58.76 +#define X86_FEATURE_VMXE	(4*32+ 5) /* Virtual Machine Extensions */
   58.77 +#define X86_FEATURE_SMXE	(4*32+ 6) /* Safer Mode Extensions */
   58.78 +#define X86_FEATURE_EST		(4*32+ 7) /* Enhanced SpeedStep */
   58.79 +#define X86_FEATURE_TM2		(4*32+ 8) /* Thermal Monitor 2 */
   58.80 +#define X86_FEATURE_SSSE3	(4*32+ 9) /* Supplemental Streaming SIMD Extensions-3 */
   58.81 +#define X86_FEATURE_CID		(4*32+10) /* Context ID */
   58.82 +#define X86_FEATURE_CX16        (4*32+13) /* CMPXCHG16B */
   58.83 +#define X86_FEATURE_XTPR	(4*32+14) /* Send Task Priority Messages */
   58.84 +#define X86_FEATURE_PDCM	(4*32+15) /* Perf/Debug Capability MSR */
   58.85 +#define X86_FEATURE_DCA		(4*32+18) /* Direct Cache Access */
   58.86 +#define X86_FEATURE_SSE4_1	(4*32+19) /* Streaming SIMD Extensions 4.1 */
   58.87 +#define X86_FEATURE_SSE4_2	(4*32+20) /* Streaming SIMD Extensions 4.2 */
   58.88 +#define X86_FEATURE_POPCNT	(4*32+23) /* POPCNT instruction */
   58.89 +
   58.90 +/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */
   58.91 +#define X86_FEATURE_XSTORE	(5*32+ 2) /* on-CPU RNG present (xstore insn) */
   58.92 +#define X86_FEATURE_XSTORE_EN	(5*32+ 3) /* on-CPU RNG enabled */
   58.93 +#define X86_FEATURE_XCRYPT	(5*32+ 6) /* on-CPU crypto (xcrypt insn) */
   58.94 +#define X86_FEATURE_XCRYPT_EN	(5*32+ 7) /* on-CPU crypto enabled */
   58.95 +#define X86_FEATURE_ACE2	(5*32+ 8) /* Advanced Cryptography Engine v2 */
   58.96 +#define X86_FEATURE_ACE2_EN	(5*32+ 9) /* ACE v2 enabled */
   58.97 +#define X86_FEATURE_PHE		(5*32+ 10) /* PadLock Hash Engine */
   58.98 +#define X86_FEATURE_PHE_EN	(5*32+ 11) /* PHE enabled */
   58.99 +#define X86_FEATURE_PMM		(5*32+ 12) /* PadLock Montgomery Multiplier */
  58.100 +#define X86_FEATURE_PMM_EN	(5*32+ 13) /* PMM enabled */
  58.101 +
  58.102 +/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */
  58.103 +#define X86_FEATURE_LAHF_LM	(6*32+ 0) /* LAHF/SAHF in long mode */
  58.104 +#define X86_FEATURE_CMP_LEGACY	(6*32+ 1) /* If yes HyperThreading not valid */
  58.105 +#define X86_FEATURE_SVME        (6*32+ 2) /* Secure Virtual Machine */
  58.106 +#define X86_FEATURE_EXTAPICSPACE (6*32+ 3) /* Extended APIC space */
  58.107 +#define X86_FEATURE_ALTMOVCR	(6*32+ 4) /* LOCK MOV CR accesses CR+8 */
  58.108 +#define X86_FEATURE_ABM		(6*32+ 5) /* Advanced Bit Manipulation */
  58.109 +#define X86_FEATURE_SSE4A	(6*32+ 6) /* AMD Streaming SIMD Extensions-4a */
  58.110 +#define X86_FEATURE_MISALIGNSSE	(6*32+ 7) /* Misaligned SSE Access */
  58.111 +#define X86_FEATURE_3DNOWPF	(6*32+ 8) /* 3DNow! Prefetch */
  58.112 +#define X86_FEATURE_OSVW	(6*32+ 9) /* OS Visible Workaround */
  58.113 +#define X86_FEATURE_IBS		(6*32+ 10) /* Instruction Based Sampling */
  58.114 +#define X86_FEATURE_SSE5	(6*32+ 11) /* AMD Streaming SIMD Extensions-5 */
  58.115 +#define X86_FEATURE_SKINIT	(6*32+ 12) /* SKINIT, STGI/CLGI, DEV */
  58.116 +#define X86_FEATURE_WDT		(6*32+ 13) /* Watchdog Timer */
  58.117 +
  58.118 +#endif /* __LIBXC_CPUFEATURE_H */
    59.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
    59.2 +++ b/tools/libxc/xc_cpuid_x86.c	Thu May 08 18:40:07 2008 +0900
    59.3 @@ -0,0 +1,433 @@
    59.4 +/******************************************************************************
    59.5 + * xc_cpuid_x86.c 
    59.6 + *
    59.7 + * Compute cpuid of a domain.
    59.8 + *
    59.9 + * Copyright (c) 2008, Citrix Systems, Inc.
   59.10 + *
   59.11 + * This program is free software; you can redistribute it and/or modify it
   59.12 + * under the terms and conditions of the GNU General Public License,
   59.13 + * version 2, as published by the Free Software Foundation.
   59.14 + *
   59.15 + * This program is distributed in the hope it will be useful, but WITHOUT
   59.16 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
   59.17 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
   59.18 + * more details.
   59.19 + *
   59.20 + * You should have received a copy of the GNU General Public License along with
   59.21 + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
   59.22 + * Place - Suite 330, Boston, MA 02111-1307 USA.
   59.23 + */
   59.24 +
   59.25 +#include <stdlib.h>
   59.26 +#include "xc_private.h"
   59.27 +#include "xc_cpufeature.h"
   59.28 +#include <xen/hvm/params.h>
   59.29 +
   59.30 +#define bitmaskof(idx)      (1u << ((idx) & 31))
   59.31 +#define clear_bit(idx, dst) ((dst) &= ~(1u << (idx)))
   59.32 +#define set_bit(idx, dst)   ((dst) |= (1u << (idx)))
   59.33 +
   59.34 +#define DEF_MAX_BASE 0x00000004u
   59.35 +#define DEF_MAX_EXT  0x80000008u
   59.36 +
   59.37 +static void amd_xc_cpuid_policy(
   59.38 +    int xc, domid_t domid, const unsigned int *input, unsigned int *regs)
   59.39 +{
   59.40 +    unsigned long pae = 0;
   59.41 +
   59.42 +    xc_get_hvm_param(xc, domid, HVM_PARAM_PAE_ENABLED, &pae);
   59.43 +
   59.44 +    switch ( input[0] )
   59.45 +    {
   59.46 +    case 0x00000001:
   59.47 +        /* Mask Intel-only features. */
   59.48 +        regs[2] &= ~(bitmaskof(X86_FEATURE_SSSE3) |
   59.49 +                     bitmaskof(X86_FEATURE_SSE4_1) |
   59.50 +                     bitmaskof(X86_FEATURE_SSE4_2));
   59.51 +        break;
   59.52 +
   59.53 +    case 0x00000002:
   59.54 +    case 0x00000004:
   59.55 +        regs[0] = regs[1] = regs[2] = 0;
   59.56 +        break;
   59.57 +
   59.58 +    case 0x80000001:
   59.59 +        if ( !pae )
   59.60 +            clear_bit(X86_FEATURE_PAE & 31, regs[3]);
   59.61 +        clear_bit(X86_FEATURE_PSE36 & 31, regs[3]);
   59.62 +
   59.63 +        /* Filter all other features according to a whitelist. */
   59.64 +        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM) |
   59.65 +                    bitmaskof(X86_FEATURE_ALTMOVCR) |
   59.66 +                    bitmaskof(X86_FEATURE_ABM) |
   59.67 +                    bitmaskof(X86_FEATURE_SSE4A) |
   59.68 +                    bitmaskof(X86_FEATURE_MISALIGNSSE) |
   59.69 +                    bitmaskof(X86_FEATURE_3DNOWPF));
   59.70 +        regs[3] &= (0x0183f3ff | /* features shared with 0x00000001:EDX */
   59.71 +                    bitmaskof(X86_FEATURE_NX) |
   59.72 +                    bitmaskof(X86_FEATURE_LM) |
   59.73 +                    bitmaskof(X86_FEATURE_SYSCALL) |
   59.74 +                    bitmaskof(X86_FEATURE_MP) |
   59.75 +                    bitmaskof(X86_FEATURE_MMXEXT) |
   59.76 +                    bitmaskof(X86_FEATURE_FFXSR) |
   59.77 +                    bitmaskof(X86_FEATURE_3DNOW) |
   59.78 +                    bitmaskof(X86_FEATURE_3DNOWEXT));
   59.79 +        break;
   59.80 +    }
   59.81 +}
   59.82 +
   59.83 +static void intel_xc_cpuid_policy(
   59.84 +    int xc, domid_t domid, const unsigned int *input, unsigned int *regs)
   59.85 +{
   59.86 +    switch ( input[0] )
   59.87 +    {
   59.88 +    case 0x00000001:
   59.89 +        /* Mask AMD-only features. */
   59.90 +        regs[2] &= ~(bitmaskof(X86_FEATURE_POPCNT));
   59.91 +        break;
   59.92 +
   59.93 +    case 0x00000004:
   59.94 +        regs[0] &= 0x3FF;
   59.95 +        regs[3] &= 0x3FF;
   59.96 +        break;
   59.97 +
   59.98 +    case 0x80000001:
   59.99 +        /* Only a few features are advertised in Intel's 0x80000001. */
  59.100 +        regs[2] &= (bitmaskof(X86_FEATURE_LAHF_LM));
  59.101 +        regs[3] &= (bitmaskof(X86_FEATURE_NX) |
  59.102 +                    bitmaskof(X86_FEATURE_LM) |
  59.103 +                    bitmaskof(X86_FEATURE_SYSCALL));
  59.104 +        break;
  59.105 +    }
  59.106 +}
  59.107 +
  59.108 +static void cpuid(const unsigned int *input, unsigned int *regs)
  59.109 +{
  59.110 +    unsigned int count = (input[1] == XEN_CPUID_INPUT_UNUSED) ? 0 : input[1];
  59.111 +    unsigned int bx_temp;
  59.112 +    asm ( "mov %%ebx,%4; cpuid; mov %%ebx,%1; mov %4,%%ebx"
  59.113 +          : "=a" (regs[0]), "=r" (regs[1]),
  59.114 +          "=c" (regs[2]), "=d" (regs[3]), "=m" (bx_temp)
  59.115 +          : "0" (input[0]), "2" (count) );
  59.116 +}
  59.117 +
  59.118 +/* Get the manufacturer brand name of the host processor. */
  59.119 +static void xc_cpuid_brand_get(char *str)
  59.120 +{
  59.121 +    unsigned int input[2] = { 0, 0 };
  59.122 +    unsigned int regs[4];
  59.123 +
  59.124 +    cpuid(input, regs);
  59.125 +
  59.126 +    *(uint32_t *)(str + 0) = regs[1];
  59.127 +    *(uint32_t *)(str + 4) = regs[3];
  59.128 +    *(uint32_t *)(str + 8) = regs[2];
  59.129 +    str[12] = '\0';
  59.130 +}
  59.131 +
  59.132 +static void xc_cpuid_policy(
  59.133 +    int xc, domid_t domid, const unsigned int *input, unsigned int *regs)
  59.134 +{
  59.135 +    char brand[13];
  59.136 +    unsigned long pae;
  59.137 +
  59.138 +    xc_get_hvm_param(xc, domid, HVM_PARAM_PAE_ENABLED, &pae);
  59.139 +
  59.140 +    switch( input[0] )
  59.141 +    {
  59.142 +    case 0x00000000:
  59.143 +        if ( regs[0] > DEF_MAX_BASE )
  59.144 +            regs[0] = DEF_MAX_BASE;
  59.145 +        break;
  59.146 +
  59.147 +    case 0x00000001:
  59.148 +        regs[2] &= (bitmaskof(X86_FEATURE_XMM3) |
  59.149 +                    bitmaskof(X86_FEATURE_SSSE3) |
  59.150 +                    bitmaskof(X86_FEATURE_CX16) |
  59.151 +                    bitmaskof(X86_FEATURE_SSE4_1) |
  59.152 +                    bitmaskof(X86_FEATURE_SSE4_2) |
  59.153 +                    bitmaskof(X86_FEATURE_POPCNT));
  59.154 +
  59.155 +        regs[3] &= (bitmaskof(X86_FEATURE_FPU) |
  59.156 +                    bitmaskof(X86_FEATURE_VME) |
  59.157 +                    bitmaskof(X86_FEATURE_DE) |
  59.158 +                    bitmaskof(X86_FEATURE_PSE) |
  59.159 +                    bitmaskof(X86_FEATURE_TSC) |
  59.160 +                    bitmaskof(X86_FEATURE_MSR) |
  59.161 +                    bitmaskof(X86_FEATURE_PAE) |
  59.162 +                    bitmaskof(X86_FEATURE_MCE) |
  59.163 +                    bitmaskof(X86_FEATURE_CX8) |
  59.164 +                    bitmaskof(X86_FEATURE_APIC) |
  59.165 +                    bitmaskof(X86_FEATURE_SEP) |
  59.166 +                    bitmaskof(X86_FEATURE_MTRR) |
  59.167 +                    bitmaskof(X86_FEATURE_PGE) |
  59.168 +                    bitmaskof(X86_FEATURE_MCA) |
  59.169 +                    bitmaskof(X86_FEATURE_CMOV) |
  59.170 +                    bitmaskof(X86_FEATURE_PAT) |
  59.171 +                    bitmaskof(X86_FEATURE_CLFLSH) |
  59.172 +                    bitmaskof(X86_FEATURE_MMX) |
  59.173 +                    bitmaskof(X86_FEATURE_FXSR) |
  59.174 +                    bitmaskof(X86_FEATURE_XMM) |
  59.175 +                    bitmaskof(X86_FEATURE_XMM2));
  59.176 +            
  59.177 +        /* We always support MTRR MSRs. */
  59.178 +        regs[3] |= bitmaskof(X86_FEATURE_MTRR);
  59.179 +
  59.180 +        if ( !pae )
  59.181 +            clear_bit(X86_FEATURE_PAE & 31, regs[3]);
  59.182 +        break;
  59.183 +
  59.184 +    case 0x80000000:
  59.185 +        if ( regs[0] > DEF_MAX_EXT )
  59.186 +            regs[0] = DEF_MAX_EXT;
  59.187 +        break;
  59.188 +
  59.189 +    case 0x80000001:
  59.190 +        if ( !pae )
  59.191 +            clear_bit(X86_FEATURE_NX & 31, regs[3]);
  59.192 +        break;
  59.193 +
  59.194 +
  59.195 +    case 0x80000008:
  59.196 +        regs[0] &= 0x0000ffffu;
  59.197 +        regs[1] = regs[2] = regs[3] = 0;
  59.198 +        break;
  59.199 +
  59.200 +    case 0x00000002:
  59.201 +    case 0x00000004:
  59.202 +    case 0x80000002:
  59.203 +    case 0x80000003:
  59.204 +    case 0x80000004:
  59.205 +    case 0x80000006:
  59.206 +        break;
  59.207 +
  59.208 +    default:
  59.209 +        regs[0] = regs[1] = regs[2] = regs[3] = 0;
  59.210 +        break;
  59.211 +    }
  59.212 +
  59.213 +    xc_cpuid_brand_get(brand);
  59.214 +    if ( strstr(brand, "AMD") )
  59.215 +        amd_xc_cpuid_policy(xc, domid, input, regs);
  59.216 +    else
  59.217 +        intel_xc_cpuid_policy(xc, domid, input, regs);
  59.218 +}
  59.219 +
  59.220 +static int xc_cpuid_do_domctl(
  59.221 +    int xc, domid_t domid,
  59.222 +    const unsigned int *input, const unsigned int *regs)
  59.223 +{
  59.224 +    DECLARE_DOMCTL;
  59.225 +
  59.226 +    memset(&domctl, 0, sizeof (domctl));
  59.227 +    domctl.domain = domid;
  59.228 +    domctl.cmd = XEN_DOMCTL_set_cpuid;
  59.229 +    domctl.u.cpuid.input[0] = input[0];
  59.230 +    domctl.u.cpuid.input[1] = input[1];
  59.231 +    domctl.u.cpuid.eax = regs[0];
  59.232 +    domctl.u.cpuid.ebx = regs[1];
  59.233 +    domctl.u.cpuid.ecx = regs[2];
  59.234 +    domctl.u.cpuid.edx = regs[3];
  59.235 +
  59.236 +    return do_domctl(xc, &domctl);
  59.237 +}
  59.238 +
  59.239 +static char *alloc_str(void)
  59.240 +{
  59.241 +    char *s = malloc(33);
  59.242 +    memset(s, 0, 33);
  59.243 +    return s;
  59.244 +}
  59.245 +
  59.246 +void xc_cpuid_to_str(const unsigned int *regs, char **strs)
  59.247 +{
  59.248 +    int i, j;
  59.249 +
  59.250 +    for ( i = 0; i < 4; i++ )
  59.251 +    {
  59.252 +        strs[i] = alloc_str();
  59.253 +        for ( j = 0; j < 32; j++ )
  59.254 +            strs[i][j] = !!((regs[i] & (1U << (31 - j)))) ? '1' : '0';
  59.255 +    }
  59.256 +}
  59.257 +
  59.258 +int xc_cpuid_apply_policy(int xc, domid_t domid)
  59.259 +{
  59.260 +    unsigned int input[2] = { 0, 0 }, regs[4];
  59.261 +    unsigned int base_max, ext_max;
  59.262 +    int rc;
  59.263 +
  59.264 +    cpuid(input, regs);
  59.265 +    base_max = (regs[0] <= DEF_MAX_BASE) ? regs[0] : DEF_MAX_BASE;
  59.266 +    input[0] = 0x80000000;
  59.267 +    cpuid(input, regs);
  59.268 +    ext_max = (regs[0] <= DEF_MAX_EXT) ? regs[0] : DEF_MAX_EXT;
  59.269 +
  59.270 +    input[0] = 0;
  59.271 +    input[1] = XEN_CPUID_INPUT_UNUSED;
  59.272 +    for ( ; ; )
  59.273 +    {
  59.274 +        cpuid(input, regs);
  59.275 +        xc_cpuid_policy(xc, domid, input, regs);
  59.276 +
  59.277 +        if ( regs[0] || regs[1] || regs[2] || regs[3] )
  59.278 +        {
  59.279 +            rc = xc_cpuid_do_domctl(xc, domid, input, regs);
  59.280 +            if ( rc )
  59.281 +                return rc;
  59.282 +
  59.283 +            /* Intel cache descriptor leaves. */
  59.284 +            if ( input[0] == 4 )
  59.285 +            {
  59.286 +                input[1]++;
  59.287 +                /* More to do? Then loop keeping %%eax==0x00000004. */
  59.288 +                if ( (regs[0] & 0x1f) != 0 )
  59.289 +                    continue;
  59.290 +            }
  59.291 +        }
  59.292 +
  59.293 +        input[0]++;
  59.294 +        input[1] = (input[0] == 4) ? 0 : XEN_CPUID_INPUT_UNUSED;
  59.295 +        if ( !(input[0] & 0x80000000u) && (input[0] > base_max ) )
  59.296 +            input[0] = 0x80000000u;
  59.297 +
  59.298 +        if ( (input[0] & 0x80000000u) && (input[0] > ext_max) )
  59.299 +            break;
  59.300 +    }
  59.301 +
  59.302 +    return 0;
  59.303 +}
  59.304 +
  59.305 +/*
  59.306 + * Check whether a VM is allowed to launch on this host's processor type.
  59.307 + *
  59.308 + * @config format is similar to that of xc_cpuid_set():
  59.309 + *  '1' -> the bit must be set to 1
  59.310 + *  '0' -> must be 0
  59.311 + *  'x' -> we don't care
  59.312 + *  's' -> (same) must be the same
  59.313 + */
  59.314 +int xc_cpuid_check(
  59.315 +    int xc, const unsigned int *input,
  59.316 +    const char **config,
  59.317 +    char **config_transformed)
  59.318 +{
  59.319 +    int i, j;
  59.320 +    unsigned int regs[4];
  59.321 +
  59.322 +    memset(config_transformed, 0, 4 * sizeof(*config_transformed));
  59.323 +
  59.324 +    cpuid(input, regs);
  59.325 +
  59.326 +    for ( i = 0; i < 4; i++ )
  59.327 +    {
  59.328 +        if ( config[i] == NULL )
  59.329 +            continue;
  59.330 +        config_transformed[i] = alloc_str();
  59.331 +        for ( j = 0; j < 32; j++ )
  59.332 +        {
  59.333 +            unsigned char val = !!((regs[i] & (1U << (31 - j))));
  59.334 +            if ( !strchr("10xs", config[i][j]) ||
  59.335 +                 ((config[i][j] == '1') && !val) ||
  59.336 +                 ((config[i][j] == '0') && val) )
  59.337 +                goto fail;
  59.338 +            config_transformed[i][j] = config[i][j];
  59.339 +            if ( config[i][j] == 's' )
  59.340 +                config_transformed[i][j] = '0' + val;
  59.341 +        }
  59.342 +    }
  59.343 +
  59.344 +    return 0;
  59.345 +
  59.346 + fail:
  59.347 +    for ( i = 0; i < 4; i++ )
  59.348 +    {
  59.349 +        free(config_transformed[i]);
  59.350 +        config_transformed[i] = NULL;
  59.351 +    }
  59.352 +    return -EPERM;
  59.353 +}
  59.354 +
  59.355 +/*
  59.356 + * Configure a single input with the informatiom from config.
  59.357 + *
  59.358 + * Config is an array of strings:
  59.359 + *   config[0] = eax
  59.360 + *   config[1] = ebx
  59.361 + *   config[2] = ecx
  59.362 + *   config[3] = edx
  59.363 + *
  59.364 + * The format of the string is the following:
  59.365 + *   '1' -> force to 1
  59.366 + *   '0' -> force to 0
  59.367 + *   'x' -> we don't care (use default)
  59.368 + *   'k' -> pass through host value
  59.369 + *   's' -> pass through the first time and then keep the same value
  59.370 + *          across save/restore and migration.
  59.371 + * 
  59.372 + * For 's' and 'x' the configuration is overwritten with the value applied.
  59.373 + */
  59.374 +int xc_cpuid_set(
  59.375 +    int xc, domid_t domid, const unsigned int *input,
  59.376 +    const char **config, char **config_transformed)
  59.377 +{
  59.378 +    int rc;
  59.379 +    unsigned int i, j, regs[4], polregs[4];
  59.380 +
  59.381 +    memset(config_transformed, 0, 4 * sizeof(*config_transformed));
  59.382 +
  59.383 +    cpuid(input, regs);
  59.384 +
  59.385 +    memcpy(polregs, regs, sizeof(regs));
  59.386 +    xc_cpuid_policy(xc, domid, input, polregs);
  59.387 +
  59.388 +    for ( i = 0; i < 4; i++ )
  59.389 +    {
  59.390 +        if ( config[i] == NULL )
  59.391 +        {
  59.392 +            regs[i] = polregs[i];
  59.393 +            continue;
  59.394 +        }
  59.395 +        
  59.396 +        config_transformed[i] = alloc_str();
  59.397 +
  59.398 +        for ( j = 0; j < 32; j++ )
  59.399 +        {
  59.400 +            unsigned char val = !!((regs[i] & (1U << (31 - j))));
  59.401 +            unsigned char polval = !!((polregs[i] & (1U << (31 - j))));
  59.402 +
  59.403 +            rc = -EINVAL;
  59.404 +            if ( !strchr("10xks", config[i][j]) )
  59.405 +                goto fail;
  59.406 +
  59.407 +            if ( config[i][j] == '1' )
  59.408 +                val = 1;
  59.409 +            else if ( config[i][j] == '0' )
  59.410 +                val = 0;
  59.411 +            else if ( config[i][j] == 'x' )
  59.412 +                val = polval;
  59.413 +
  59.414 +            if ( val )
  59.415 +                set_bit(31 - j, regs[i]);
  59.416 +            else
  59.417 +                clear_bit(31 - j, regs[i]);
  59.418 +
  59.419 +            config_transformed[i][j] = config[i][j];
  59.420 +            if ( config[i][j] == 's' )
  59.421 +                config_transformed[i][j] = '0' + val;
  59.422 +        }
  59.423 +    }
  59.424 +
  59.425 +    rc = xc_cpuid_do_domctl(xc, domid, input, regs);
  59.426 +    if ( rc == 0 )
  59.427 +        return 0;
  59.428 +
  59.429 + fail:
  59.430 +    for ( i = 0; i < 4; i++ )
  59.431 +    {
  59.432 +        free(config_transformed[i]);
  59.433 +        config_transformed[i] = NULL;
  59.434 +    }
  59.435 +    return rc;
  59.436 +}
    60.1 --- a/tools/libxc/xc_domain.c	Fri Apr 25 20:13:52 2008 +0900
    60.2 +++ b/tools/libxc/xc_domain.c	Thu May 08 18:40:07 2008 +0900
    60.3 @@ -795,6 +795,32 @@ int xc_deassign_device(
    60.4      return do_domctl(xc_handle, &domctl);
    60.5  }
    60.6  
    60.7 +int xc_domain_update_msi_irq(
    60.8 +    int xc_handle,
    60.9 +    uint32_t domid,
   60.10 +    uint32_t gvec,
   60.11 +    uint32_t pirq,
   60.12 +    uint32_t gflags)
   60.13 +{
   60.14 +    int rc;
   60.15 +    xen_domctl_bind_pt_irq_t *bind;
   60.16 +
   60.17 +    DECLARE_DOMCTL;
   60.18 +
   60.19 +    domctl.cmd = XEN_DOMCTL_bind_pt_irq;
   60.20 +    domctl.domain = (domid_t)domid;
   60.21 +
   60.22 +    bind = &(domctl.u.bind_pt_irq);
   60.23 +    bind->hvm_domid = domid;
   60.24 +    bind->irq_type = PT_IRQ_TYPE_MSI;
   60.25 +    bind->machine_irq = pirq;
   60.26 +    bind->u.msi.gvec = gvec;
   60.27 +    bind->u.msi.gflags = gflags;
   60.28 +
   60.29 +    rc = do_domctl(xc_handle, &domctl);
   60.30 +    return rc;
   60.31 +}
   60.32 +
   60.33  /* Pass-through: binds machine irq to guests irq */
   60.34  int xc_domain_bind_pt_irq(
   60.35      int xc_handle,
    61.1 --- a/tools/libxc/xc_minios.c	Fri Apr 25 20:13:52 2008 +0900
    61.2 +++ b/tools/libxc/xc_minios.c	Thu May 08 18:40:07 2008 +0900
    61.3 @@ -178,7 +178,7 @@ static void evtchn_handler(evtchn_port_t
    61.4  	printk("Unknown port for handle %d\n", xce_handle);
    61.5  	return;
    61.6      }
    61.7 -    files[xce_handle].evtchn.ports[i].pending++;
    61.8 +    files[xce_handle].evtchn.ports[i].pending = 1;
    61.9      files[xce_handle].read = 1;
   61.10      wake_up(&event_queue);
   61.11  }
   61.12 @@ -278,7 +278,7 @@ evtchn_port_or_error_t xc_evtchn_pending
   61.13      for (i = 0; i < MAX_EVTCHN_PORTS; i++) {
   61.14  	evtchn_port_t port = files[xce_handle].evtchn.ports[i].port;
   61.15  	if (port != -1 && files[xce_handle].evtchn.ports[i].pending) {
   61.16 -	    files[xce_handle].evtchn.ports[i].pending--;
   61.17 +	    files[xce_handle].evtchn.ports[i].pending = 0;
   61.18  	    local_irq_restore(flags);
   61.19  	    return port;
   61.20  	}
    62.1 --- a/tools/libxc/xc_misc.c	Fri Apr 25 20:13:52 2008 +0900
    62.2 +++ b/tools/libxc/xc_misc.c	Thu May 08 18:40:07 2008 +0900
    62.3 @@ -236,6 +236,37 @@ int xc_hvm_set_pci_link_route(
    62.4      return rc;
    62.5  }
    62.6  
    62.7 +int xc_hvm_track_dirty_vram(
    62.8 +    int xc_handle, domid_t dom,
    62.9 +    uint64_t first_pfn, uint64_t nr,
   62.10 +    unsigned long *dirty_bitmap)
   62.11 +{
   62.12 +    DECLARE_HYPERCALL;
   62.13 +    struct xen_hvm_track_dirty_vram arg;
   62.14 +    int rc;
   62.15 +
   62.16 +    hypercall.op     = __HYPERVISOR_hvm_op;
   62.17 +    hypercall.arg[0] = HVMOP_track_dirty_vram;
   62.18 +    hypercall.arg[1] = (unsigned long)&arg;
   62.19 +
   62.20 +    arg.domid     = dom;
   62.21 +    arg.first_pfn = first_pfn;
   62.22 +    arg.nr        = nr;
   62.23 +    set_xen_guest_handle(arg.dirty_bitmap, (uint8_t *)dirty_bitmap);
   62.24 +
   62.25 +    if ( (rc = lock_pages(&arg, sizeof(arg))) != 0 )
   62.26 +    {
   62.27 +        PERROR("Could not lock memory");
   62.28 +        return rc;
   62.29 +    }
   62.30 +
   62.31 +    rc = do_xen_hypercall(xc_handle, &hypercall);
   62.32 +
   62.33 +    unlock_pages(&arg, sizeof(arg));
   62.34 +
   62.35 +    return rc;
   62.36 +}
   62.37 +
   62.38  void *xc_map_foreign_pages(int xc_handle, uint32_t dom, int prot,
   62.39                             const xen_pfn_t *arr, int num)
   62.40  {
    63.1 --- a/tools/libxc/xc_pagetab.c	Fri Apr 25 20:13:52 2008 +0900
    63.2 +++ b/tools/libxc/xc_pagetab.c	Thu May 08 18:40:07 2008 +0900
    63.3 @@ -141,7 +141,7 @@ unsigned long xc_translate_foreign_addre
    63.4  
    63.5      /* Page Table */
    63.6  
    63.7 -    if (pde & 0x00000008) { /* 4M page (or 2M in PAE mode) */
    63.8 +    if (pde & 0x00000080) { /* 4M page (or 2M in PAE mode) */
    63.9          DPRINTF("Cannot currently cope with 2/4M pages\n");
   63.10          exit(-1);
   63.11      } else { /* 4k page */
    64.1 --- a/tools/libxc/xc_physdev.c	Fri Apr 25 20:13:52 2008 +0900
    64.2 +++ b/tools/libxc/xc_physdev.c	Thu May 08 18:40:07 2008 +0900
    64.3 @@ -19,3 +19,75 @@ int xc_physdev_pci_access_modify(int xc_
    64.4      errno = ENOSYS;
    64.5      return -1;
    64.6  }
    64.7 +
    64.8 +int xc_physdev_map_pirq(int xc_handle,
    64.9 +                        int domid,
   64.10 +                        int type,
   64.11 +                        int index,
   64.12 +                        int *pirq)
   64.13 +{
   64.14 +    int rc;
   64.15 +    struct physdev_map_pirq map;
   64.16 +
   64.17 +    if ( !pirq )
   64.18 +        return -EINVAL;
   64.19 +
   64.20 +    map.domid = domid;
   64.21 +    map.type = type;
   64.22 +    map.index = index;
   64.23 +    map.pirq = *pirq;
   64.24 +
   64.25 +    rc = do_physdev_op(xc_handle, PHYSDEVOP_map_pirq, &map);
   64.26 +
   64.27 +    if ( !rc )
   64.28 +        *pirq = map.pirq;
   64.29 +
   64.30 +    return rc;
   64.31 +}
   64.32 +
   64.33 +int xc_physdev_map_pirq_msi(int xc_handle,
   64.34 +                            int domid,
   64.35 +                            int type,
   64.36 +                            int index,
   64.37 +                            int *pirq,
   64.38 +                            int devfn,
   64.39 +                            int bus,
   64.40 +                            int msi_type)
   64.41 +{
   64.42 +    int rc;
   64.43 +    struct physdev_map_pirq map;
   64.44 +
   64.45 +    if ( !pirq )
   64.46 +        return -EINVAL;
   64.47 +
   64.48 +    map.domid = domid;
   64.49 +    map.type = type;
   64.50 +    map.index = index;
   64.51 +    map.pirq = *pirq;
   64.52 +    map.msi_info.devfn = devfn;
   64.53 +    map.msi_info.bus = bus;
   64.54 +    map.msi_info.msi = msi_type;
   64.55 +
   64.56 +    rc = do_physdev_op(xc_handle, PHYSDEVOP_map_pirq, &map);
   64.57 +
   64.58 +    if ( !rc )
   64.59 +        *pirq = map.pirq;
   64.60 +
   64.61 +    return rc;
   64.62 +}
   64.63 +
   64.64 +int xc_physdev_unmap_pirq(int xc_handle,
   64.65 +                          int domid,
   64.66 +                          int pirq)
   64.67 +{
   64.68 +    int rc;
   64.69 +    struct physdev_unmap_pirq unmap;
   64.70 +
   64.71 +    unmap.domid = domid;
   64.72 +    unmap.pirq = pirq;
   64.73 +
   64.74 +    rc = do_physdev_op(xc_handle, PHYSDEVOP_unmap_pirq, &unmap);
   64.75 +
   64.76 +    return rc;
   64.77 +}
   64.78 +
    65.1 --- a/tools/libxc/xc_private.h	Fri Apr 25 20:13:52 2008 +0900
    65.2 +++ b/tools/libxc/xc_private.h	Thu May 08 18:40:07 2008 +0900
    65.3 @@ -24,10 +24,12 @@
    65.4  #define DECLARE_HYPERCALL privcmd_hypercall_t hypercall = { 0 }
    65.5  #define DECLARE_DOMCTL struct xen_domctl domctl = { 0 }
    65.6  #define DECLARE_SYSCTL struct xen_sysctl sysctl = { 0 }
    65.7 +#define DECLARE_PHYSDEV_OP struct physdev_op physdev_op = { 0 }
    65.8  #else
    65.9  #define DECLARE_HYPERCALL privcmd_hypercall_t hypercall
   65.10  #define DECLARE_DOMCTL struct xen_domctl domctl
   65.11  #define DECLARE_SYSCTL struct xen_sysctl sysctl
   65.12 +#define DECLARE_PHYSDEV_OP struct physdev_op physdev_op
   65.13  #endif
   65.14  
   65.15  #undef PAGE_SHIFT
   65.16 @@ -96,6 +98,34 @@ static inline int do_xen_version(int xc_
   65.17      return do_xen_hypercall(xc_handle, &hypercall);
   65.18  }
   65.19  
   65.20 +static inline int do_physdev_op(int xc_handle, int cmd, void *op)
   65.21 +{
   65.22 +    int ret = -1;
   65.23 +
   65.24 +    DECLARE_HYPERCALL;
   65.25 +    hypercall.op = __HYPERVISOR_physdev_op;
   65.26 +    hypercall.arg[0] = (unsigned long) cmd;
   65.27 +    hypercall.arg[1] = (unsigned long) op;
   65.28 +
   65.29 +    if ( lock_pages(op, sizeof(*op)) != 0 )
   65.30 +    {
   65.31 +        PERROR("Could not lock memory for Xen hypercall");
   65.32 +        goto out1;
   65.33 +    }
   65.34 +
   65.35 +    if ( (ret = do_xen_hypercall(xc_handle, &hypercall)) < 0 )
   65.36 +    {
   65.37 +        if ( errno == EACCES )
   65.38 +            DPRINTF("physdev operation failed -- need to"
   65.39 +                    " rebuild the user-space tool set?\n");
   65.40 +    }
   65.41 +
   65.42 +    unlock_pages(op, sizeof(*op));
   65.43 +
   65.44 +out1:
   65.45 +    return ret;
   65.46 +}
   65.47 +
   65.48  static inline int do_domctl(int xc_handle, struct xen_domctl *domctl)
   65.49  {
   65.50      int ret = -1;
    66.1 --- a/tools/libxc/xenctrl.h	Fri Apr 25 20:13:52 2008 +0900
    66.2 +++ b/tools/libxc/xenctrl.h	Thu May 08 18:40:07 2008 +0900
    66.3 @@ -21,6 +21,7 @@
    66.4  #include <stdint.h>
    66.5  #include <xen/xen.h>
    66.6  #include <xen/domctl.h>
    66.7 +#include <xen/physdev.h>
    66.8  #include <xen/sysctl.h>
    66.9  #include <xen/version.h>
   66.10  #include <xen/event_channel.h>
   66.11 @@ -849,6 +850,25 @@ int xc_gnttab_munmap(int xcg_handle,
   66.12  int xc_gnttab_set_max_grants(int xcg_handle,
   66.13  			     uint32_t count);
   66.14  
   66.15 +int xc_physdev_map_pirq(int xc_handle,
   66.16 +                        int domid,
   66.17 +                        int type,
   66.18 +                        int index,
   66.19 +                        int *pirq);
   66.20 +
   66.21 +int xc_physdev_map_pirq_msi(int xc_handle,
   66.22 +                            int domid,
   66.23 +                            int type,
   66.24 +                            int index,
   66.25 +                            int *pirq,
   66.26 +                            int devfn,
   66.27 +                            int bus,
   66.28 +                            int msi_type);
   66.29 +
   66.30 +int xc_physdev_unmap_pirq(int xc_handle,
   66.31 +                          int domid,
   66.32 +                          int pirq);
   66.33 +
   66.34  int xc_hvm_set_pci_intx_level(
   66.35      int xc_handle, domid_t dom,
   66.36      uint8_t domain, uint8_t bus, uint8_t device, uint8_t intx,
   66.37 @@ -862,6 +882,22 @@ int xc_hvm_set_pci_link_route(
   66.38      int xc_handle, domid_t dom, uint8_t link, uint8_t isa_irq);
   66.39  
   66.40  
   66.41 +/*
   66.42 + * Track dirty bit changes in the VRAM area
   66.43 + *
   66.44 + * All of this is done atomically:
   66.45 + * - get the dirty bitmap since the last call
   66.46 + * - set up dirty tracking area for period up to the next call
   66.47 + * - clear the dirty tracking area.
   66.48 + *
   66.49 + * Returns -ENODATA and does not fill bitmap if the area has changed since the
   66.50 + * last call.
   66.51 + */
   66.52 +int xc_hvm_track_dirty_vram(
   66.53 +    int xc_handle, domid_t dom,
   66.54 +    uint64_t first_pfn, uint64_t nr,
   66.55 +    unsigned long *bitmap);
   66.56 +
   66.57  typedef enum {
   66.58    XC_ERROR_NONE = 0,
   66.59    XC_INTERNAL_ERROR = 1,
   66.60 @@ -949,6 +985,13 @@ int xc_domain_ioport_mapping(int xc_hand
   66.61                               uint32_t nr_ports,
   66.62                               uint32_t add_mapping);
   66.63  
   66.64 +int xc_domain_update_msi_irq(
   66.65 +    int xc_handle,
   66.66 +    uint32_t domid,
   66.67 +    uint32_t gvec,
   66.68 +    uint32_t pirq,
   66.69 +    uint32_t gflags);
   66.70 +
   66.71  int xc_domain_bind_pt_irq(int xc_handle,
   66.72                            uint32_t domid,
   66.73                            uint8_t machine_irq,
   66.74 @@ -983,4 +1026,20 @@ int xc_domain_set_target(int xc_handle,
   66.75                           uint32_t domid,
   66.76                           uint32_t target);
   66.77  
   66.78 +#if defined(__i386__) || defined(__x86_64__)
   66.79 +int xc_cpuid_check(int xc,
   66.80 +                   const unsigned int *input,
   66.81 +                   const char **config,
   66.82 +                   char **config_transformed);
   66.83 +int xc_cpuid_set(int xc,
   66.84 +                 domid_t domid,
   66.85 +                 const unsigned int *input,
   66.86 +                 const char **config,
   66.87 +                 char **config_transformed);
   66.88 +int xc_cpuid_apply_policy(int xc,
   66.89 +                          domid_t domid);
   66.90 +void xc_cpuid_to_str(const unsigned int *regs,
   66.91 +                     char **strs);
   66.92 +#endif
   66.93 +
   66.94  #endif /* XENCTRL_H */
    67.1 --- a/tools/pygrub/src/fsimage/fsimage.c	Fri Apr 25 20:13:52 2008 +0900
    67.2 +++ b/tools/pygrub/src/fsimage/fsimage.c	Thu May 08 18:40:07 2008 +0900
    67.3 @@ -17,7 +17,7 @@
    67.4   * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    67.5   * DEALINGS IN THE SOFTWARE.
    67.6   *
    67.7 - * Copyright 2006 Sun Microsystems, Inc.  All rights reserved.
    67.8 + * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
    67.9   * Use is subject to license terms.
   67.10   */
   67.11  
   67.12 @@ -281,6 +281,22 @@ fsimage_open(PyObject *o, PyObject *args
   67.13  	return (PyObject *)fs;
   67.14  }
   67.15  
   67.16 +static PyObject *
   67.17 +fsimage_getbootstring(PyObject *o, PyObject *args)
   67.18 +{
   67.19 +	PyObject *fs;
   67.20 +	char	*bootstring;
   67.21 +	fsi_t	*fsi;
   67.22 +
   67.23 +	if (!PyArg_ParseTuple(args, "O", &fs))
   67.24 +		return (NULL);
   67.25 +
   67.26 +	fsi = ((fsimage_fs_t *)fs)->fs;
   67.27 +	bootstring = fsi_fs_bootstring(fsi);
   67.28 +
   67.29 +	return Py_BuildValue("s", bootstring);
   67.30 +}
   67.31 +
   67.32  PyDoc_STRVAR(fsimage_open__doc__,
   67.33      "open(name, [offset=off]) - Open the given file as a filesystem image.\n"
   67.34      "\n"
   67.35 @@ -288,9 +304,15 @@ PyDoc_STRVAR(fsimage_open__doc__,
   67.36      "offset - offset of file system within file image.\n"
   67.37      "options - mount options string.\n");
   67.38  
   67.39 +PyDoc_STRVAR(fsimage_getbootstring__doc__,
   67.40 +    "getbootstring(fs) - Return the boot string needed for this file system "
   67.41 +    "or NULL if none is needed.\n");
   67.42 +
   67.43  static struct PyMethodDef fsimage_module_methods[] = {
   67.44  	{ "open", (PyCFunction)fsimage_open,
   67.45  	    METH_VARARGS|METH_KEYWORDS, fsimage_open__doc__ },
   67.46 +	{ "getbootstring", (PyCFunction)fsimage_getbootstring,
   67.47 +	    METH_VARARGS, fsimage_getbootstring__doc__ },
   67.48  	{ NULL, NULL, 0, NULL }
   67.49  };
   67.50  
    68.1 --- a/tools/pygrub/src/pygrub	Fri Apr 25 20:13:52 2008 +0900
    68.2 +++ b/tools/pygrub/src/pygrub	Thu May 08 18:40:07 2008 +0900
    68.3 @@ -646,7 +646,13 @@ if __name__ == "__main__":
    68.4          print "  args: %s" % chosencfg["args"]
    68.5          sys.exit(0)
    68.6  
    68.7 -    fs = fsimage.open(file, get_fs_offset(file))
    68.8 +    # if boot filesystem is set then pass to fsimage.open
    68.9 +    bootfsargs = '"%s"' % incfg["args"]
   68.10 +    bootfsgroup = re.findall('zfs-bootfs=(.*?)[\s\,\"]', bootfsargs)
   68.11 +    if bootfsgroup:
   68.12 +        fs = fsimage.open(file, get_fs_offset(file), bootfsgroup[0])
   68.13 +    else:
   68.14 +        fs = fsimage.open(file, get_fs_offset(file))
   68.15  
   68.16      chosencfg = sniff_solaris(fs, incfg)
   68.17  
   68.18 @@ -672,7 +678,15 @@ if __name__ == "__main__":
   68.19      if bootcfg["ramdisk"]:
   68.20          sxp += "(ramdisk %s)" % bootcfg["ramdisk"]
   68.21      if chosencfg["args"]:
   68.22 -        sxp += "(args \"%s\")" % chosencfg["args"]
   68.23 +        zfsinfo = fsimage.getbootstring(fs)
   68.24 +        if zfsinfo is None:
   68.25 +            sxp += "(args \"%s\")" % chosencfg["args"]
   68.26 +        else:
   68.27 +            e = re.compile("zfs-bootfs=[\w\-\.\:@/]+" )
   68.28 +            (chosencfg["args"],count) = e.subn(zfsinfo, chosencfg["args"])
   68.29 +            if count == 0:
   68.30 +               chosencfg["args"] += " -B %s" % zfsinfo
   68.31 +            sxp += "(args \"%s\")" % (chosencfg["args"])
   68.32  
   68.33      sys.stdout.flush()
   68.34      os.write(fd, sxp)
    69.1 --- a/tools/python/xen/lowlevel/xc/xc.c	Fri Apr 25 20:13:52 2008 +0900
    69.2 +++ b/tools/python/xen/lowlevel/xc/xc.c	Thu May 08 18:40:07 2008 +0900
    69.3 @@ -611,6 +611,110 @@ static PyObject *pyxc_set_os_type(XcObje
    69.4  }
    69.5  #endif /* __ia64__ */
    69.6  
    69.7 +
    69.8 +#if defined(__i386__) || defined(__x86_64__)
    69.9 +static void pyxc_dom_extract_cpuid(PyObject *config,
   69.10 +                                  char **regs)
   69.11 +{
   69.12 +    const char *regs_extract[4] = { "eax", "ebx", "ecx", "edx" };
   69.13 +    PyObject *obj;
   69.14 +    int i;
   69.15 +
   69.16 +    memset(regs, 0, 4*sizeof(*regs));
   69.17 +
   69.18 +    if ( !PyDict_Check(config) )
   69.19 +        return;
   69.20 +
   69.21 +    for ( i = 0; i < 4; i++ )
   69.22 +        if ( (obj = PyDict_GetItemString(config, regs_extract[i])) != NULL )
   69.23 +            regs[i] = PyString_AS_STRING(obj);
   69.24 +}
   69.25 +
   69.26 +static PyObject *pyxc_create_cpuid_dict(char **regs)
   69.27 +{
   69.28 +   const char *regs_extract[4] = { "eax", "ebx", "ecx", "edx" };
   69.29 +   PyObject *dict;
   69.30 +   int i;
   69.31 +
   69.32 +   dict = PyDict_New();
   69.33 +   for ( i = 0; i < 4; i++ )
   69.34 +   {
   69.35 +       if ( regs[i] == NULL )
   69.36 +           continue;
   69.37 +       PyDict_SetItemString(dict, regs_extract[i],
   69.38 +                            PyString_FromString(regs[i]));
   69.39 +       free(regs[i]);
   69.40 +       regs[i] = NULL;
   69.41 +   }
   69.42 +   return dict;
   69.43 +}
   69.44 +
   69.45 +static PyObject *pyxc_dom_check_cpuid(XcObject *self,
   69.46 +                                      PyObject *args)
   69.47 +{
   69.48 +    PyObject *sub_input, *config;
   69.49 +    unsigned int input[2];
   69.50 +    char *regs[4], *regs_transform[4];
   69.51 +
   69.52 +    if ( !PyArg_ParseTuple(args, "iOO", &input[0], &sub_input, &config) )
   69.53 +        return NULL;
   69.54 +
   69.55 +    pyxc_dom_extract_cpuid(config, regs);
   69.56 +
   69.57 +    input[1] = XEN_CPUID_INPUT_UNUSED;
   69.58 +    if ( PyLong_Check(sub_input) )
   69.59 +        input[1] = PyLong_AsUnsignedLong(sub_input);
   69.60 +
   69.61 +    if ( xc_cpuid_check(self->xc_handle, input,
   69.62 +                        (const char **)regs, regs_transform) )
   69.63 +        return pyxc_error_to_exception();
   69.64 +
   69.65 +    return pyxc_create_cpuid_dict(regs_transform);
   69.66 +}
   69.67 +
   69.68 +static PyObject *pyxc_dom_set_policy_cpuid(XcObject *self,
   69.69 +                                           PyObject *args)
   69.70 +{
   69.71 +    domid_t domid;
   69.72 +
   69.73 +    if ( !PyArg_ParseTuple(args, "i", &domid) )
   69.74 +        return NULL;
   69.75 +
   69.76 +    if ( xc_cpuid_apply_policy(self->xc_handle, domid) )
   69.77 +        return pyxc_error_to_exception();
   69.78 +
   69.79 +    Py_INCREF(zero);
   69.80 +    return zero;
   69.81 +}
   69.82 +
   69.83 +
   69.84 +static PyObject *pyxc_dom_set_cpuid(XcObject *self,
   69.85 +                                    PyObject *args)
   69.86 +{
   69.87 +    domid_t domid;
   69.88 +    PyObject *sub_input, *config;
   69.89 +    unsigned int input[2];
   69.90 +    char *regs[4], *regs_transform[4];
   69.91 +
   69.92 +    if ( !PyArg_ParseTuple(args, "IIOO", &domid,
   69.93 +                           &input[0], &sub_input, &config) )
   69.94 +        return NULL;
   69.95 +
   69.96 +    pyxc_dom_extract_cpuid(config, regs);
   69.97 +
   69.98 +    input[1] = XEN_CPUID_INPUT_UNUSED;
   69.99 +    if ( PyLong_Check(sub_input) )
  69.100 +        input[1] = PyLong_AsUnsignedLong(sub_input);
  69.101 +
  69.102 +    if ( xc_cpuid_set(self->xc_handle, domid, input, (const char **)regs,
  69.103 +                      regs_transform) )
  69.104 +        return pyxc_error_to_exception();
  69.105 +
  69.106 +    return pyxc_create_cpuid_dict(regs_transform);
  69.107 +}
  69.108 +
  69.109 +#endif /* __i386__ || __x86_64__ */
  69.110 +
  69.111  static PyObject *pyxc_hvm_build(XcObject *self,
  69.112                                  PyObject *args,
  69.113                                  PyObject *kwds)
  69.114 @@ -695,6 +799,26 @@ static PyObject *pyxc_evtchn_reset(XcObj
  69.115      return zero;
  69.116  }
  69.117  
  69.118 +static PyObject *pyxc_physdev_map_pirq(PyObject *self,
  69.119 +                                       PyObject *args,
  69.120 +                                       PyObject *kwds)
  69.121 +{
  69.122 +    XcObject *xc = (XcObject *)self;
  69.123 +    uint32_t dom;
  69.124 +    int index, pirq, ret;
  69.125 +
  69.126 +    static char *kwd_list[] = {"domid", "index", "pirq", NULL};
  69.127 +
  69.128 +    if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
  69.129 +                                      &dom, &index, &pirq) )
  69.130 +        return NULL;
  69.131 +    ret = xc_physdev_map_pirq(xc->xc_handle, dom, MAP_PIRQ_TYPE_GSI,
  69.132 +                             index, &pirq);
  69.133 +    if ( ret != 0 )
  69.134 +          return pyxc_error_to_exception();
  69.135 +    return PyLong_FromUnsignedLong(pirq);
  69.136 +}
  69.137 +
  69.138  static PyObject *pyxc_physdev_pci_access_modify(XcObject *self,
  69.139                                                  PyObject *args,
  69.140                                                  PyObject *kwds)
  69.141 @@ -1485,6 +1609,15 @@ static PyMethodDef pyxc_methods[] = {
  69.142        "Reset all connections.\n"
  69.143        " dom [int]: Domain to reset.\n" },
  69.144  
  69.145 +    { "physdev_map_pirq",
  69.146 +      (PyCFunction)pyxc_physdev_map_pirq,
  69.147 +      METH_VARARGS | METH_KEYWORDS, "\n"
  69.148 +      "map physical irq to guest pirq.\n"
  69.149 +      " dom     [int]:      Identifier of domain to map for.\n"
  69.150 +      " index   [int]:      physical irq.\n"
  69.151 +      " pirq    [int]:      guest pirq.\n"
  69.152 +      "Returns: [long] value of the param.\n" },
  69.153 +
  69.154      { "physdev_pci_access_modify",
  69.155        (PyCFunction)pyxc_physdev_pci_access_modify,
  69.156        METH_VARARGS | METH_KEYWORDS, "\n"
  69.157 @@ -1635,6 +1768,37 @@ static PyMethodDef pyxc_methods[] = {
  69.158        " log [int]: Specifies the area's size.\n"
  69.159        "Returns: [int] 0 on success; -1 on error.\n" },
  69.160  #endif /* __powerpc */
  69.161 +  
  69.162 +#if defined(__i386__) || defined(__x86_64__)
  69.163 +    { "domain_check_cpuid", 
  69.164 +      (PyCFunction)pyxc_dom_check_cpuid, 
  69.165 +      METH_VARARGS, "\n"
  69.166 +      "Apply checks to host CPUID.\n"
  69.167 +      " input [long]: Input for cpuid instruction (eax)\n"
  69.168 +      " sub_input [long]: Second input (optional, may be None) for cpuid "
  69.169 +      "                     instruction (ecx)\n"
  69.170 +      " config [dict]: Dictionary of register\n"
  69.171 +      " config [dict]: Dictionary of register, use for checking\n\n"
  69.172 +      "Returns: [int] 0 on success; exception on error.\n" },
  69.173 +    
  69.174 +    { "domain_set_cpuid", 
  69.175 +      (PyCFunction)pyxc_dom_set_cpuid, 
  69.176 +      METH_VARARGS, "\n"
  69.177 +      "Set cpuid response for an input and a domain.\n"
  69.178 +      " dom [int]: Identifier of domain.\n"
  69.179 +      " input [long]: Input for cpuid instruction (eax)\n"
  69.180 +      " sub_input [long]: Second input (optional, may be None) for cpuid "
  69.181 +      "                     instruction (ecx)\n"
  69.182 +      " config [dict]: Dictionary of register\n\n"
  69.183 +      "Returns: [int] 0 on success; exception on error.\n" },
  69.184 +
  69.185 +    { "domain_set_policy_cpuid", 
  69.186 +      (PyCFunction)pyxc_dom_set_policy_cpuid, 
  69.187 +      METH_VARARGS, "\n"
  69.188 +      "Set the default cpuid policy for a domain.\n"
  69.189 +      " dom [int]: Identifier of domain.\n\n"
  69.190 +      "Returns: [int] 0 on success; exception on error.\n" },
  69.191 +#endif
  69.192  
  69.193      { NULL, NULL, 0, NULL }
  69.194  };
    70.1 --- a/tools/python/xen/util/acmpolicy.py	Fri Apr 25 20:13:52 2008 +0900
    70.2 +++ b/tools/python/xen/util/acmpolicy.py	Thu May 08 18:40:07 2008 +0900
    70.3 @@ -49,8 +49,6 @@ ACM_SIMPLE_TYPE_ENFORCEMENT_POLICY = 2
    70.4  ACM_POLICY_UNDEFINED = 15
    70.5  
    70.6  
    70.7 -ACM_SCHEMA_FILE = ACM_POLICIES_DIR + "security_policy.xsd"
    70.8 -
    70.9  ACM_LABEL_UNLABELED = "__UNLABELED__"
   70.10  ACM_LABEL_UNLABELED_DISPLAY = "unlabeled"
   70.11  
   70.12 @@ -118,6 +116,153 @@ DEFAULT_policy = \
   70.13  "  </SecurityLabelTemplate>\n" +\
   70.14  "</SecurityPolicyDefinition>\n"
   70.15  
   70.16 +ACM_SCHEMA="""<?xml version="1.0" encoding="UTF-8"?>
   70.17 +<!-- Author: Ray Valdez, Reiner Sailer {rvaldez,sailer}@us.ibm.com -->
   70.18 +<!--         This file defines the schema, which is used to define -->
   70.19 +<!--         the security policy and the security labels in Xen.    -->
   70.20 +
   70.21 +<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.ibm.com" xmlns="http://www.ibm.com" elementFormDefault="qualified">
   70.22 +	<xsd:element name="SecurityPolicyDefinition">
   70.23 +		<xsd:complexType>
   70.24 +			<xsd:sequence>
   70.25 +				<xsd:element ref="PolicyHeader" minOccurs="1" maxOccurs="1"></xsd:element>
   70.26 +				<xsd:element ref="SimpleTypeEnforcement" minOccurs="0" maxOccurs="1"></xsd:element>
   70.27 +				<xsd:element ref="ChineseWall" minOccurs="0" maxOccurs="1"></xsd:element>
   70.28 +				<xsd:element ref="SecurityLabelTemplate" minOccurs="1" maxOccurs="1"></xsd:element>
   70.29 +			</xsd:sequence>
   70.30 +		</xsd:complexType>
   70.31 +	</xsd:element>
   70.32 +	<xsd:element name="PolicyHeader">
   70.33 +		<xsd:complexType>
   70.34 +			<xsd:sequence>
   70.35 +				<xsd:element name="PolicyName" minOccurs="1" maxOccurs="1" type="xsd:string"></xsd:element>
   70.36 +				<xsd:element name="PolicyUrl" minOccurs="0" maxOccurs="1" type="xsd:string"></xsd:element>
   70.37 +				<xsd:element name="Reference" type="xsd:string" minOccurs="0" maxOccurs="1" />
   70.38 +				<xsd:element name="Date" minOccurs="0" maxOccurs="1" type="xsd:string"></xsd:element>
   70.39 +				<xsd:element name="NameSpaceUrl" minOccurs="0" maxOccurs="1" type="xsd:string"></xsd:element>
   70.40 +				<xsd:element name="Version" minOccurs="1" maxOccurs="1" type="VersionFormat"/>
   70.41 +				<xsd:element ref="FromPolicy" minOccurs="0" maxOccurs="1"/>
   70.42 +			</xsd:sequence>
   70.43 +		</xsd:complexType>
   70.44 +	</xsd:element>
   70.45 +	<xsd:element name="ChineseWall">
   70.46 +		<xsd:complexType>
   70.47 +			<xsd:sequence>
   70.48 +				<xsd:element ref="ChineseWallTypes" minOccurs="1" maxOccurs="1" />
   70.49 +				<xsd:element ref="ConflictSets" minOccurs="0" maxOccurs="1" />
   70.50 +			</xsd:sequence>
   70.51 +			<xsd:attribute name="priority" type="PolicyOrder" use="optional"></xsd:attribute>
   70.52 +		</xsd:complexType>
   70.53 +	</xsd:element>
   70.54 +	<xsd:element name="SimpleTypeEnforcement">
   70.55 +		<xsd:complexType>
   70.56 +			<xsd:sequence>
   70.57 +				<xsd:element ref="SimpleTypeEnforcementTypes" />
   70.58 +			</xsd:sequence>
   70.59 +			<xsd:attribute name="priority" type="PolicyOrder" use="optional"></xsd:attribute>
   70.60 +		</xsd:complexType>
   70.61 +	</xsd:element>
   70.62 +	<xsd:element name="SecurityLabelTemplate">
   70.63 +		<xsd:complexType>
   70.64 +			<xsd:sequence>
   70.65 +				<xsd:element name="SubjectLabels" minOccurs="0" maxOccurs="1">
   70.66 +					<xsd:complexType>
   70.67 +						<xsd:sequence>
   70.68 +							<xsd:element ref="VirtualMachineLabel" minOccurs="1" maxOccurs="unbounded"></xsd:element>
   70.69 +						</xsd:sequence>
   70.70 +						<xsd:attribute name="bootstrap" type="xsd:string" use="required"></xsd:attribute>
   70.71 +					</xsd:complexType>
   70.72 +				</xsd:element>
   70.73 +				<xsd:element name="ObjectLabels" minOccurs="0" maxOccurs="1">
   70.74 +					<xsd:complexType>
   70.75 +						<xsd:sequence>
   70.76 +							<xsd:element ref="ResourceLabel" minOccurs="1" maxOccurs="unbounded"></xsd:element>
   70.77 +						</xsd:sequence>
   70.78 +					</xsd:complexType>
   70.79 +				</xsd:element>
   70.80 +			</xsd:sequence>
   70.81 +		</xsd:complexType>
   70.82 +	</xsd:element>
   70.83 +	<xsd:element name="ChineseWallTypes">
   70.84 +		<xsd:complexType>
   70.85 +			<xsd:sequence>
   70.86 +				<xsd:element maxOccurs="unbounded" minOccurs="1" ref="Type" />
   70.87 +			</xsd:sequence>
   70.88 +		</xsd:complexType>
   70.89 +	</xsd:element>
   70.90 +	<xsd:element name="ConflictSets">
   70.91 +		<xsd:complexType>
   70.92 +			<xsd:sequence>
   70.93 +				<xsd:element maxOccurs="unbounded" minOccurs="1" ref="Conflict" />
   70.94 +			</xsd:sequence>
   70.95 +		</xsd:complexType>
   70.96 +	</xsd:element>
   70.97 +	<xsd:element name="SimpleTypeEnforcementTypes">
   70.98 +		<xsd:complexType>
   70.99 +			<xsd:sequence>
  70.100 +				<xsd:element maxOccurs="unbounded" minOccurs="1" ref="Type" />
  70.101 +			</xsd:sequence>
  70.102 +		</xsd:complexType>
  70.103 +	</xsd:element>
  70.104 +	<xsd:element name="Conflict">
  70.105 +		<xsd:complexType>
  70.106 +			<xsd:sequence>
  70.107 +				<xsd:element maxOccurs="unbounded" minOccurs="1" ref="Type" />
  70.108 +			</xsd:sequence>
  70.109 +			<xsd:attribute name="name" type="xsd:string" use="required"></xsd:attribute>
  70.110 +		</xsd:complexType>
  70.111 +	</xsd:element>
  70.112 +	<xsd:element name="VirtualMachineLabel">
  70.113 +		<xsd:complexType>
  70.114 +			<xsd:sequence>
  70.115 +				<xsd:element name="Name" type="NameWithFrom"></xsd:element>
  70.116 +				<xsd:element ref="SimpleTypeEnforcementTypes" minOccurs="0" maxOccurs="unbounded" />
  70.117 +				<xsd:element ref="ChineseWallTypes" minOccurs="0" maxOccurs="unbounded" />
  70.118 +			</xsd:sequence>
  70.119 +		</xsd:complexType>
  70.120 +	</xsd:element>
  70.121 +	<xsd:element name="ResourceLabel">
  70.122 +		<xsd:complexType>
  70.123 +			<xsd:sequence>
  70.124 +				<xsd:element name="Name" type="NameWithFrom"></xsd:element>
  70.125 +				<xsd:element name="SimpleTypeEnforcementTypes" type="SingleSimpleTypeEnforcementType" />
  70.126 +			</xsd:sequence>
  70.127 +		</xsd:complexType>
  70.128 +	</xsd:element>
  70.129 +	<xsd:element name="Name" type="xsd:string" />
  70.130 +	<xsd:element name="Type" type="xsd:string" />
  70.131 +	<xsd:simpleType name="PolicyOrder">
  70.132 +		<xsd:restriction base="xsd:string">
  70.133 +			<xsd:enumeration value="PrimaryPolicyComponent"></xsd:enumeration>
  70.134 +		</xsd:restriction>
  70.135 +	</xsd:simpleType>
  70.136 +	<xsd:element name="FromPolicy">
  70.137 +		<xsd:complexType>
  70.138 +			<xsd:sequence>
  70.139 +				<xsd:element name="PolicyName" minOccurs="1" maxOccurs="1" type="xsd:string"/>
  70.140 +				<xsd:element name="Version" minOccurs="1" maxOccurs="1" type="VersionFormat"/>
  70.141 +			</xsd:sequence>
  70.142 +		</xsd:complexType>
  70.143 +	</xsd:element>
  70.144 +	<xsd:simpleType name="VersionFormat">
  70.145 +		<xsd:restriction base="xsd:string">
  70.146 +			<xsd:pattern value="[0-9]{1,8}.[0-9]{1,8}"></xsd:pattern>
  70.147 +		</xsd:restriction>
  70.148 +	</xsd:simpleType>
  70.149 +	<xsd:complexType name="NameWithFrom">
  70.150 +		<xsd:simpleContent>
  70.151 +			<xsd:extension base="xsd:string">
  70.152 +				<xsd:attribute name="from" type="xsd:string" use="optional"></xsd:attribute>
  70.153 +			</xsd:extension>
  70.154 +		</xsd:simpleContent>
  70.155 +	</xsd:complexType>
  70.156 +	<xsd:complexType name="SingleSimpleTypeEnforcementType">
  70.157 +		<xsd:sequence>
  70.158 +			<xsd:element maxOccurs="1" minOccurs="1" ref="Type" />
  70.159 +		</xsd:sequence>
  70.160 +	</xsd:complexType>
  70.161 +</xsd:schema>"""
  70.162 +
  70.163  
  70.164  def get_DEFAULT_policy(dom0label=""):
  70.165      fromnode = ""
  70.166 @@ -133,18 +278,7 @@ def initialize():
  70.167  
  70.168      instdir = security.install_policy_dir_prefix
  70.169      DEF_policy_file = "DEFAULT-security_policy.xml"
  70.170 -    xsd_file = "security_policy.xsd"
  70.171  
  70.172 -    files = [ xsd_file ]
  70.173 -
  70.174 -    for file in files:
  70.175 -        if not os.path.isfile(policiesdir + "/" + file ):
  70.176 -            try:
  70.177 -                shutil.copyfile(instdir + "/" + file,
  70.178 -                                policiesdir + "/" + file)
  70.179 -            except Exception, e:
  70.180 -                log.info("could not copy '%s': %s" %
  70.181 -                         (file, str(e)))
  70.182      #Install default policy.
  70.183      f = open(policiesdir + "/" + DEF_policy_file, 'w')
  70.184      if f:
  70.185 @@ -219,7 +353,8 @@ class ACMPolicy(XSPolicy):
  70.186              log.warn("Libxml2 python-wrapper is not installed on the system.")
  70.187              return xsconstants.XSERR_SUCCESS
  70.188          try:
  70.189 -            parserctxt = libxml2.schemaNewParserCtxt(ACM_SCHEMA_FILE)
  70.190 +            parserctxt = libxml2.schemaNewMemParserCtxt(ACM_SCHEMA,
  70.191 +                                                        len(ACM_SCHEMA))
  70.192              schemaparser = parserctxt.schemaParse()
  70.193              valid = schemaparser.schemaNewValidCtxt()
  70.194              doc = libxml2.parseDoc(self.toxml())
    71.1 --- a/tools/python/xen/util/blkif.py	Fri Apr 25 20:13:52 2008 +0900
    71.2 +++ b/tools/python/xen/util/blkif.py	Thu May 08 18:40:07 2008 +0900
    71.3 @@ -42,10 +42,12 @@ def blkdev_name_to_number(name):
    71.4      if re.match( '/dev/xvd[a-p]([1-9]|1[0-5])?', n):
    71.5          return 202 * 256 + 16 * (ord(n[8:9]) - ord('a')) + int(n[9:] or 0)
    71.6  
    71.7 -    # see if this is a hex device number
    71.8 -    if re.match( '^(0x)?[0-9a-fA-F]+$', name ):
    71.9 +    if re.match( '^(0x)[0-9a-fA-F]+$', name ):
   71.10          return string.atoi(name,16)
   71.11 -        
   71.12 +
   71.13 +    if re.match('^[0-9]+$', name):
   71.14 +        return string.atoi(name, 10)
   71.15 +
   71.16      return None
   71.17  
   71.18  def blkdev_segment(name):
    72.1 --- a/tools/python/xen/util/bootloader.py	Fri Apr 25 20:13:52 2008 +0900
    72.2 +++ b/tools/python/xen/util/bootloader.py	Thu May 08 18:40:07 2008 +0900
    72.3 @@ -26,8 +26,6 @@ from xen.xend.XendLogging import log
    72.4  from xen.util import mkdir
    72.5  import xen.util.xsm.xsm as security
    72.6  
    72.7 -__bootloader = None
    72.8 -
    72.9  #
   72.10  # Functions for modifying entries in the bootloader, i.e. adding
   72.11  # a module to boot the system with a policy.
   72.12 @@ -513,8 +511,11 @@ class LatePolicyLoader(Bootloader):
   72.13          Bootloader.__init__(self)
   72.14  
   72.15      def probe(self):
   72.16 -        _dir=os.path.dirname(self.FILENAME)
   72.17 -        mkdir.parents(_dir, stat.S_IRWXU)
   72.18 +        try:
   72.19 +            _dir=os.path.dirname(self.FILENAME)
   72.20 +            mkdir.parents(_dir, stat.S_IRWXU)
   72.21 +        except:
   72.22 +            return False
   72.23          return True
   72.24  
   72.25      def get_default_title(self):
   72.26 @@ -614,10 +615,12 @@ class LatePolicyLoader(Bootloader):
   72.27  
   72.28  __bootloader = Bootloader()
   72.29  
   72.30 -grub = Grub()
   72.31 -if grub.probe() == True:
   72.32 -    __bootloader = grub
   72.33 -else:
   72.34 -    late = LatePolicyLoader()
   72.35 -    if late.probe() == True:
   72.36 -        __bootloader = late
   72.37 +def init():
   72.38 +    global __bootloader
   72.39 +    grub = Grub()
   72.40 +    if grub.probe() == True:
   72.41 +        __bootloader = grub
   72.42 +    else:
   72.43 +        late = LatePolicyLoader()
   72.44 +        if late.probe() == True:
   72.45 +            __bootloader = late
    73.1 --- a/tools/python/xen/util/pci.py	Fri Apr 25 20:13:52 2008 +0900
    73.2 +++ b/tools/python/xen/util/pci.py	Thu May 08 18:40:07 2008 +0900
    73.3 @@ -7,6 +7,7 @@
    73.4  
    73.5  import sys
    73.6  import os, os.path
    73.7 +import resource
    73.8  
    73.9  PROC_MNT_PATH = '/proc/mounts'
   73.10  PROC_PCI_PATH = '/proc/bus/pci/devices'
   73.11 @@ -14,6 +15,7 @@ PROC_PCI_NUM_RESOURCES = 7
   73.12  
   73.13  SYSFS_PCI_DEVS_PATH = '/bus/pci/devices'
   73.14  SYSFS_PCI_DEV_RESOURCE_PATH = '/resource'
   73.15 +SYSFS_PCI_DEV_CONFIG_PATH = '/config'
   73.16  SYSFS_PCI_DEV_IRQ_PATH = '/irq'
   73.17  SYSFS_PCI_DEV_DRIVER_DIR_PATH = '/driver'
   73.18  SYSFS_PCI_DEV_VENDOR_PATH = '/vendor'
   73.19 @@ -24,7 +26,21 @@ SYSFS_PCI_DEV_SUBDEVICE_PATH = '/subsyst
   73.20  PCI_BAR_IO = 0x01
   73.21  PCI_BAR_IO_MASK = ~0x03
   73.22  PCI_BAR_MEM_MASK = ~0x0f
   73.23 +PCI_STATUS_CAP_MASK = 0x10
   73.24 +PCI_STATUS_OFFSET = 0x6
   73.25 +PCI_CAP_OFFSET = 0x34
   73.26 +MSIX_BIR_MASK = 0x7
   73.27 +MSIX_SIZE_MASK = 0x7ff
   73.28  
   73.29 +#Calculate PAGE_SHIFT: number of bits to shift an address to get the page number
   73.30 +PAGE_SIZE = resource.getpagesize()
   73.31 +PAGE_SHIFT = 0
   73.32 +t = PAGE_SIZE
   73.33 +while not (t&1):
   73.34 +    t>>=1
   73.35 +    PAGE_SHIFT+=1
   73.36 +
   73.37 +PAGE_MASK=~(PAGE_SIZE - 1)
   73.38  # Definitions from Linux: include/linux/pci.h
   73.39  def PCI_DEVFN(slot, func):
   73.40      return ((((slot) & 0x1f) << 3) | ((func) & 0x07))
   73.41 @@ -74,10 +90,73 @@ class PciDevice:
   73.42          self.device = None
   73.43          self.subvendor = None
   73.44          self.subdevice = None
   73.45 -
   73.46 +        self.msix = 0
   73.47 +        self.msix_iomem = []
   73.48          self.get_info_from_sysfs()
   73.49  
   73.50 +    def find_capability(self, type):
   73.51 +        try:
   73.52 +            sysfs_mnt = find_sysfs_mnt()
   73.53 +        except IOError, (errno, strerr):
   73.54 +            raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
   73.55 +                (PROC_PCI_PATH, strerr, errno)))
   73.56 +
   73.57 +        if sysfs_mnt == None:
   73.58 +            return False
   73.59 +        path = sysfs_mnt+SYSFS_PCI_DEVS_PATH+'/'+ \
   73.60 +               self.name+SYSFS_PCI_DEV_CONFIG_PATH
   73.61 +        try:
   73.62 +            conf_file = open(path, 'rb')
   73.63 +            conf_file.seek(PCI_STATUS_OFFSET)
   73.64 +            status = ord(conf_file.read(1))
   73.65 +            if status&PCI_STATUS_CAP_MASK:
   73.66 +                conf_file.seek(PCI_CAP_OFFSET)
   73.67 +                capa_pointer = ord(conf_file.read(1))
   73.68 +                while capa_pointer:
   73.69 +                    conf_file.seek(capa_pointer)
   73.70 +                    capa_id = ord(conf_file.read(1))
   73.71 +                    capa_pointer = ord(conf_file.read(1))
   73.72 +                    if capa_id == type:
   73.73 +                        # get the type
   73.74 +                        message_cont_lo = ord(conf_file.read(1))
   73.75 +                        message_cont_hi = ord(conf_file.read(1))
   73.76 +                        self.msix=1
   73.77 +                        self.msix_entries = (message_cont_lo + \
   73.78 +                                             (message_cont_hi << 8)) \
   73.79 +                                             & MSIX_SIZE_MASK
   73.80 +                        t_off=conf_file.read(4)
   73.81 +                        p_off=conf_file.read(4)
   73.82 +                        self.table_offset=ord(t_off[0]) | (ord(t_off[1])<<8) | \
   73.83 +                                          (ord(t_off[2])<<16)|  \
   73.84 +                                          (ord(t_off[3])<<24)
   73.85 +                        self.pba_offset=ord(p_off[0]) | (ord(p_off[1]) << 8)| \
   73.86 +                                        (ord(p_off[2])<<16) | \
   73.87 +                                        (ord(p_off[3])<<24)
   73.88 +                        self.table_index = self.table_offset & MSIX_BIR_MASK
   73.89 +                        self.table_offset = self.table_offset & ~MSIX_BIR_MASK
   73.90 +                        self.pba_index = self.pba_offset & MSIX_BIR_MASK
   73.91 +                        self.pba_offset = self.pba_offset & ~MSIX_BIR_MASK
   73.92 +                        break
   73.93 +        except IOError, (errno, strerr):
   73.94 +            raise PciDeviceParseError(('Failed to locate sysfs mount: %s (%d)' %
   73.95 +                (PROC_PCI_PATH, strerr, errno)))
   73.96 +
   73.97 +    def remove_msix_iomem(self, index, start, size):
   73.98 +        if (index == self.table_index):
   73.99 +            table_start = start+self.table_offset
  73.100 +            table_end = table_start + self.msix_entries * 16
  73.101 +            table_start = table_start & PAGE_MASK
  73.102 +            table_end = (table_end + PAGE_SIZE) & PAGE_MASK
  73.103 +            self.msix_iomem.append((table_start, table_end-table_start))
  73.104 +        if (index==self.pba_index):
  73.105 +            pba_start = start + self.pba_offset
  73.106 +            pba_end = pba_start + self.msix_entries/8
  73.107 +            pba_start = pba_start & PAGE_MASK
  73.108 +            pba_end = (pba_end + PAGE_SIZE) & PAGE_MASK
  73.109 +            self.msix_iomem.append((pba_start, pba_end-pba_start))
  73.110 +
  73.111      def get_info_from_sysfs(self):
  73.112 +        self.find_capability(0x11)
  73.113          try:
  73.114              sysfs_mnt = find_sysfs_mnt()
  73.115          except IOError, (errno, strerr):
  73.116 @@ -108,6 +187,10 @@ class PciDevice:
  73.117                          self.ioports.append( (start,size) )
  73.118                      else:
  73.119                          self.iomem.append( (start,size) )
  73.120 +                    if (self.msix):
  73.121 +                        self.remove_msix_iomem(i, start, size)
  73.122 +
  73.123 +
  73.124  
  73.125          except IOError, (errno, strerr):
  73.126              raise PciDeviceParseError(('Failed to open & read %s: %s (%d)' %
    74.1 --- a/tools/python/xen/util/xsm/acm/acm.py	Fri Apr 25 20:13:52 2008 +0900
    74.2 +++ b/tools/python/xen/util/xsm/acm/acm.py	Thu May 08 18:40:07 2008 +0900
    74.3 @@ -156,7 +156,9 @@ def on():
    74.4      returns none if security policy is off (not compiled),
    74.5      any string otherwise, use it: if not security.on() ...
    74.6      """
    74.7 -    return (get_active_policy_name() not in ['INACTIVE', 'NULL'])
    74.8 +    if get_active_policy_name() not in ['INACTIVE', 'NULL', '']:
    74.9 +        return xsconstants.XS_POLICY_ACM
   74.10 +    return 0
   74.11  
   74.12  
   74.13  def calc_dom_ssidref_from_info(info):
    75.1 --- a/tools/python/xen/util/xsm/flask/flask.py	Fri Apr 25 20:13:52 2008 +0900
    75.2 +++ b/tools/python/xen/util/xsm/flask/flask.py	Thu May 08 18:40:07 2008 +0900
    75.3 @@ -12,7 +12,7 @@ def err(msg):
    75.4      raise XSMError(msg)
    75.5  
    75.6  def on():
    75.7 -    return 1
    75.8 +    return 0 #xsconstants.XS_POLICY_FLASK
    75.9  
   75.10  def ssidref2label(ssidref):
   75.11      try:
    76.1 --- a/tools/python/xen/web/tcp.py	Fri Apr 25 20:13:52 2008 +0900
    76.2 +++ b/tools/python/xen/web/tcp.py	Thu May 08 18:40:07 2008 +0900
    76.3 @@ -64,3 +64,43 @@ class TCPListener(connection.SocketListe
    76.4                  sock.close()
    76.5              except:
    76.6                  pass
    76.7 +
    76.8 +class SSLTCPListener(TCPListener):
    76.9 +
   76.10 +    def __init__(self, protocol_class, port, interface, hosts_allow,
   76.11 +                 ssl_key_file = None, ssl_cert_file = None):
   76.12 +        if not ssl_key_file or not ssl_cert_file:
   76.13 +            raise ValueError("SSLXMLRPCServer requires ssl_key_file "
   76.14 +                             "and ssl_cert_file to be set.")
   76.15 +
   76.16 +        self.ssl_key_file = ssl_key_file
   76.17 +        self.ssl_cert_file = ssl_cert_file
   76.18 +
   76.19 +        TCPListener.__init__(self, protocol_class, port, interface, hosts_allow)
   76.20 +
   76.21 +
   76.22 +    def createSocket(self):
   76.23 +        from OpenSSL import SSL
   76.24 +        # make a SSL socket
   76.25 +        ctx = SSL.Context(SSL.SSLv23_METHOD)
   76.26 +        ctx.set_options(SSL.OP_NO_SSLv2)
   76.27 +        ctx.use_privatekey_file (self.ssl_key_file)
   76.28 +        ctx.use_certificate_file(self.ssl_cert_file)
   76.29 +        sock = SSL.Connection(ctx,
   76.30 +                              socket.socket(socket.AF_INET, socket.SOCK_STREAM))
   76.31 +        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
   76.32 +
   76.33 +        # SO_REUSEADDR does not always ensure that we do not get an address
   76.34 +        # in use error when restarted quickly
   76.35 +        # we implement a timeout to try and avoid failing unnecessarily
   76.36 +        timeout = time.time() + 30
   76.37 +        while True:
   76.38 +            try:
   76.39 +                sock.bind((self.interface, self.port))
   76.40 +                return sock
   76.41 +            except socket.error, (_errno, strerrno):
   76.42 +                if _errno == errno.EADDRINUSE and time.time() < timeout:
   76.43 +                    time.sleep(0.5)
   76.44 +                else:
   76.45 +                    raise
   76.46 +
    77.1 --- a/tools/python/xen/xend/XendCheckpoint.py	Fri Apr 25 20:13:52 2008 +0900
    77.2 +++ b/tools/python/xen/xend/XendCheckpoint.py	Thu May 08 18:40:07 2008 +0900
    77.3 @@ -309,6 +309,7 @@ def restore(xd, fd, dominfo = None, paus
    77.4                  else:
    77.5                      break
    77.6              os.close(qemu_fd)
    77.7 +            restore_image.setCpuid()
    77.8  
    77.9  
   77.10          os.read(fd, 1)           # Wait for source to close connection
    78.1 --- a/tools/python/xen/xend/XendConfig.py	Fri Apr 25 20:13:52 2008 +0900
    78.2 +++ b/tools/python/xen/xend/XendConfig.py	Thu May 08 18:40:07 2008 +0900
    78.3 @@ -203,6 +203,8 @@ XENAPI_CFG_TYPES = {
    78.4      'target': int,
    78.5      'security_label': str,
    78.6      'pci': str,
    78.7 +    'cpuid' : dict,
    78.8 +    'cpuid_check' : dict,
    78.9  }
   78.10  
   78.11  # List of legacy configuration keys that have no equivalent in the
   78.12 @@ -497,6 +499,32 @@ class XendConfig(dict):
   78.13          if 'handle' in dominfo:
   78.14              self['uuid'] = uuid.toString(dominfo['handle'])
   78.15              
   78.16 +    def parse_cpuid(self, cfg, field):
   78.17 +       def int2bin(n, count=32):
   78.18 +           return "".join([str((n >> y) & 1) for y in range(count-1, -1, -1)])
   78.19 +
   78.20 +       for input, regs in cfg[field].iteritems():
   78.21 +           if not regs is dict:
   78.22 +               cfg[field][input] = dict(regs)
   78.23 +
   78.24 +       cpuid = {}
   78.25 +       for input in cfg[field]:
   78.26 +           inputs = input.split(',')
   78.27 +           if inputs[0][0:2] == '0x':
   78.28 +               inputs[0] = str(int(inputs[0], 16))
   78.29 +           if len(inputs) == 2:
   78.30 +               if inputs[1][0:2] == '0x':
   78.31 +                   inputs[1] = str(int(inputs[1], 16))
   78.32 +           new_input = ','.join(inputs)
   78.33 +           cpuid[new_input] = {} # new input
   78.34 +           for reg in cfg[field][input]:
   78.35 +               val = cfg[field][input][reg]
   78.36 +               if val[0:2] == '0x':
   78.37 +                   cpuid[new_input][reg] = int2bin(int(val, 16))
   78.38 +               else:
   78.39 +                   cpuid[new_input][reg] = val
   78.40 +       cfg[field] = cpuid
   78.41 +
   78.42      def _parse_sxp(self, sxp_cfg):
   78.43          """ Populate this XendConfig using the parsed SXP.
   78.44  
   78.45 @@ -653,8 +681,14 @@ class XendConfig(dict):
   78.46                  except ValueError, e:
   78.47                      raise XendConfigError('cpus = %s: %s' % (cfg['cpus'], e))
   78.48  
   78.49 +        # Parse cpuid
   78.50 +        if 'cpuid' in cfg:
   78.51 +            self.parse_cpuid(cfg, 'cpuid')
   78.52 +        if 'cpuid_check' in cfg:
   78.53 +            self.parse_cpuid(cfg, 'cpuid_check')
   78.54 +
   78.55          import xen.util.xsm.xsm as security
   78.56 -        if security.on():
   78.57 +        if security.on() == xsconstants.XS_POLICY_ACM:
   78.58              from xen.util.acmpolicy import ACM_LABEL_UNLABELED
   78.59              if not 'security' in cfg and sxp.child_value(sxp_cfg, 'security'):
   78.60                  cfg['security'] = sxp.child_value(sxp_cfg, 'security')
   78.61 @@ -901,6 +935,16 @@ class XendConfig(dict):
   78.62              int(self['vcpus_params'].get('weight', 256))
   78.63          self['vcpus_params']['cap'] = int(self['vcpus_params'].get('cap', 0))
   78.64  
   78.65 +    def cpuid_to_sxp(self, sxpr, field):
   78.66 +        regs_list = []
   78.67 +        for input, regs in self[field].iteritems():
   78.68 +            reg_list = []
   78.69 +            for reg, val in regs.iteritems():
   78.70 +                reg_list.append([reg, val])
   78.71 +            regs_list.append([input, reg_list])
   78.72 +        sxpr.append([field, regs_list])
   78.73 +
   78.74 +
   78.75      def to_sxp(self, domain = None, ignore_devices = False, ignore = [],
   78.76                 legacy_only = True):
   78.77          """ Get SXP representation of this config object.
   78.78 @@ -1012,6 +1056,13 @@ class XendConfig(dict):
   78.79                  txn.abort()
   78.80                  raise
   78.81  
   78.82 +        if 'cpuid' in self:
   78.83 +            self.cpuid_to_sxp(sxpr, 'cpuid')
   78.84 +        if 'cpuid_check' in self:
   78.85 +            self.cpuid_to_sxp(sxpr, 'cpuid_check')
   78.86 +
   78.87 +        log.debug(sxpr)
   78.88 +
   78.89          return sxpr    
   78.90      
   78.91      def _blkdev_name_to_number(self, dev):
    79.1 --- a/tools/python/xen/xend/XendDomain.py	Fri Apr 25 20:13:52 2008 +0900
    79.2 +++ b/tools/python/xen/xend/XendDomain.py	Thu May 08 18:40:07 2008 +0900
    79.3 @@ -1258,7 +1258,7 @@ class XendDomain:
    79.4  
    79.5          return val       
    79.6  
    79.7 -    def domain_migrate(self, domid, dst, live=False, resource=0, port=0, node=-1):
    79.8 +    def domain_migrate(self, domid, dst, live=False, port=0, node=-1):
    79.9          """Start domain migration.
   79.10          
   79.11          @param domid: Domain ID or Name
   79.12 @@ -1269,7 +1269,6 @@ class XendDomain:
   79.13          @type port: int        
   79.14          @keyword live: Live migration
   79.15          @type live: bool
   79.16 -        @keyword resource: not used??
   79.17          @rtype: None
   79.18          @keyword node: use node number for target
   79.19          @rtype: int 
   79.20 @@ -1293,8 +1292,16 @@ class XendDomain:
   79.21  
   79.22          if port == 0:
   79.23              port = xoptions.get_xend_relocation_port()
   79.24 +
   79.25          try:
   79.26 -            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   79.27 +            tls = xoptions.get_xend_relocation_tls()
   79.28 +            if tls:
   79.29 +                from OpenSSL import SSL
   79.30 +                ctx = SSL.Context(SSL.SSLv23_METHOD)
   79.31 +                sock = SSL.Connection(ctx, socket.socket(socket.AF_INET, socket.SOCK_STREAM))
   79.32 +                sock.set_connect_state()
   79.33 +            else:
   79.34 +                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   79.35              sock.connect((dst, port))
   79.36          except socket.error, err:
   79.37              raise XendError("can't connect: %s" % err[1])
    80.1 --- a/tools/python/xen/xend/XendDomainInfo.py	Fri Apr 25 20:13:52 2008 +0900
    80.2 +++ b/tools/python/xen/xend/XendDomainInfo.py	Thu May 08 18:40:07 2008 +0900
    80.3 @@ -37,6 +37,7 @@ import xen.lowlevel.xc
    80.4  from xen.util import asserts
    80.5  from xen.util.blkif import blkdev_uname_to_file, blkdev_uname_to_taptype
    80.6  import xen.util.xsm.xsm as security
    80.7 +from xen.util import xsconstants
    80.8  
    80.9  from xen.xend import balloon, sxp, uuid, image, arch, osdep
   80.10  from xen.xend import XendOptions, XendNode, XendConfig
   80.11 @@ -1973,7 +1974,7 @@ class XendDomainInfo:
   80.12          balloon.free(2*1024) # 2MB should be plenty
   80.13  
   80.14          ssidref = 0
   80.15 -        if security.on():
   80.16 +        if security.on() == xsconstants.XS_POLICY_ACM:
   80.17              ssidref = security.calc_dom_ssidref_from_info(self.info)
   80.18              if security.has_authorization(ssidref) == False:
   80.19                  raise VmError("VM is not authorized to run.")
   80.20 @@ -1987,7 +1988,7 @@ class XendDomainInfo:
   80.21                  target = self.info.target())
   80.22          except Exception, e:
   80.23              # may get here if due to ACM the operation is not permitted
   80.24 -            if security.on():
   80.25 +            if security.on() == xsconstants.XS_POLICY_ACM:
   80.26                  raise VmError('Domain in conflict set with running domain?')
   80.27  
   80.28          if self.domid < 0:
   80.29 @@ -2135,8 +2136,13 @@ class XendDomainInfo:
   80.30              # set memory limit
   80.31              xc.domain_setmaxmem(self.domid, maxmem)
   80.32  
   80.33 +            # Reserve 1 page per MiB of RAM for separate VT-d page table.
   80.34 +            vtd_mem = 4 * (self.info['memory_static_max'] / 1024 / 1024)
   80.35 +            # Round vtd_mem up to a multiple of a MiB.
   80.36 +            vtd_mem = ((vtd_mem + 1023) / 1024) * 1024
   80.37 +
   80.38              # Make sure there's enough RAM available for the domain
   80.39 -            balloon.free(memory + shadow)
   80.40 +            balloon.free(memory + shadow + vtd_mem)
   80.41  
   80.42              # Set up the shadow memory
   80.43              shadow_cur = xc.shadow_mem_control(self.domid, shadow / 1024)
   80.44 @@ -2848,7 +2854,6 @@ class XendDomainInfo:
   80.45          is_policy_update = (xspol_old != None)
   80.46  
   80.47          from xen.xend.XendXSPolicyAdmin import XSPolicyAdminInstance
   80.48 -        from xen.util import xsconstants
   80.49  
   80.50          state = self._stateGet()
   80.51          # Relabel only HALTED or RUNNING or PAUSED domains
    81.1 --- a/tools/python/xen/xend/XendOptions.py	Fri Apr 25 20:13:52 2008 +0900
    81.2 +++ b/tools/python/xen/xend/XendOptions.py	Thu May 08 18:40:07 2008 +0900
    81.3 @@ -192,6 +192,12 @@ class XendOptions:
    81.4          return self.get_config_bool("xend-relocation-server",
    81.5                                      self.xend_relocation_server_default)
    81.6  
    81.7 +    def get_xend_relocation_server_ssl_key_file(self):
    81.8 +        return self.get_config_string("xend-relocation-server-ssl-key-file")
    81.9 +
   81.10 +    def get_xend_relocation_server_ssl_cert_file(self):
   81.11 +        return self.get_config_string("xend-relocation-server-ssl-cert-file")
   81.12 +
   81.13      def get_xend_port(self):
   81.14          """Get the port xend listens at for its HTTP interface.
   81.15          """
   81.16 @@ -203,6 +209,11 @@ class XendOptions:
   81.17          return self.get_config_int('xend-relocation-port',
   81.18                                     self.xend_relocation_port_default)
   81.19  
   81.20 +    def get_xend_relocation_tls(self):
   81.21 +        """Whether to use tls when relocating.
   81.22 +        """
   81.23 +        return self.get_config_bool('xend-relocation-tls', 'no')
   81.24 +
   81.25      def get_xend_relocation_hosts_allow(self):
   81.26          return self.get_config_string("xend-relocation-hosts-allow",
   81.27                                       self.xend_relocation_hosts_allow_default)
    82.1 --- a/tools/python/xen/xend/XendXSPolicyAdmin.py	Fri Apr 25 20:13:52 2008 +0900
    82.2 +++ b/tools/python/xen/xend/XendXSPolicyAdmin.py	Thu May 08 18:40:07 2008 +0900
    82.3 @@ -46,7 +46,12 @@ class XSPolicyAdmin:
    82.4          self.maxpolicies = maxpolicies
    82.5          self.policies = {}
    82.6          self.xsobjs = {}
    82.7 +        bootloader.init()
    82.8  
    82.9 +        if security.on() == xsconstants.XS_POLICY_ACM:
   82.10 +            self.__acm_init()
   82.11 +
   82.12 +    def __acm_init(self):
   82.13          act_pol_name = self.get_hv_loaded_policy_name()
   82.14          initialize()
   82.15  
   82.16 @@ -73,7 +78,7 @@ class XSPolicyAdmin:
   82.17              This currently only checks for ACM-enablement.
   82.18          """
   82.19          rc = 0
   82.20 -        if security.on():
   82.21 +        if security.on() == xsconstants.XS_POLICY_ACM:
   82.22              rc |= xsconstants.XS_POLICY_ACM
   82.23          return rc
   82.24  
   82.25 @@ -103,6 +108,8 @@ class XSPolicyAdmin:
   82.26  
   82.27      def __add_acmpolicy_to_system(self, xmltext, flags, overwrite):
   82.28          errors = ""
   82.29 +        if security.on() != xsconstants.XS_POLICY_ACM:
   82.30 +            raise SecurityError(-xsconstants.XSERR_POLICY_TYPE_UNSUPPORTED)
   82.31          loadedpol = self.get_loaded_policy()
   82.32          if loadedpol:
   82.33              # This is meant as an update to a currently loaded policy
    83.1 --- a/tools/python/xen/xend/image.py	Fri Apr 25 20:13:52 2008 +0900
    83.2 +++ b/tools/python/xen/xend/image.py	Thu May 08 18:40:07 2008 +0900
    83.3 @@ -551,6 +551,38 @@ class HVMImageHandler(ImageHandler):
    83.4          self.acpi = int(vmConfig['platform'].get('acpi', 0))
    83.5          self.guest_os_type = vmConfig['platform'].get('guest_os_type')
    83.6  
    83.7 +        self.vmConfig = vmConfig
    83.8 +           
    83.9 +    def setCpuid(self):
   83.10 +        xc.domain_set_policy_cpuid(self.vm.getDomid())
   83.11 +
   83.12 +        if 'cpuid' in self.vmConfig:
   83.13 +            cpuid = self.vmConfig['cpuid']
   83.14 +            transformed = {}
   83.15 +            for sinput, regs in cpuid.iteritems():
   83.16 +                inputs = sinput.split(',')
   83.17 +                input = long(inputs[0])
   83.18 +                sub_input = None
   83.19 +                if len(inputs) == 2:
   83.20 +                    sub_input = long(inputs[1])
   83.21 +                t = xc.domain_set_cpuid(self.vm.getDomid(),
   83.22 +                                        input, sub_input, regs)
   83.23 +                transformed[sinput] = t
   83.24 +            self.vmConfig['cpuid'] = transformed
   83.25 +
   83.26 +        if 'cpuid_check' in self.vmConfig:
   83.27 +            cpuid_check = self.vmConfig['cpuid_check']
   83.28 +            transformed = {}
   83.29 +            for sinput, regs_check in cpuid_check.iteritems():
   83.30 +                inputs = sinput.split(',')
   83.31 +                input = long(inputs[0])
   83.32 +                sub_input = None
   83.33 +                if len(inputs) == 2:
   83.34 +                    sub_input = long(inputs[1])
   83.35 +                t = xc.domain_check_cpuid(input, sub_input, regs_check)
   83.36 +                transformed[sinput] = t
   83.37 +            self.vmConfig['cpuid_check'] = transformed
   83.38 +
   83.39      # Return a list of cmd line args to the device models based on the
   83.40      # xm config file
   83.41      def parseDeviceModelArgs(self, vmConfig):
   83.42 @@ -718,6 +750,7 @@ class X86_HVM_ImageHandler(HVMImageHandl
   83.43  
   83.44      def buildDomain(self):
   83.45          xc.hvm_set_param(self.vm.getDomid(), HVM_PARAM_PAE_ENABLED, self.pae)
   83.46 +        self.setCpuid()
   83.47          return HVMImageHandler.buildDomain(self)
   83.48  
   83.49      def getRequiredAvailableMemory(self, mem_kb):
    84.1 --- a/tools/python/xen/xend/server/SrvDomain.py	Fri Apr 25 20:13:52 2008 +0900
    84.2 +++ b/tools/python/xen/xend/server/SrvDomain.py	Thu May 08 18:40:07 2008 +0900
    84.3 @@ -115,7 +115,6 @@ class SrvDomain(SrvDir):
    84.4                      [['dom',         'int'],
    84.5                       ['destination', 'str'],
    84.6                       ['live',        'int'],
    84.7 -                     ['resource',    'int'],
    84.8                       ['port',        'int']])
    84.9          return fn(req.args, {'dom': self.dom.domid})
   84.10  
    85.1 --- a/tools/python/xen/xend/server/blkif.py	Fri Apr 25 20:13:52 2008 +0900
    85.2 +++ b/tools/python/xen/xend/server/blkif.py	Thu May 08 18:40:07 2008 +0900
    85.3 @@ -23,6 +23,7 @@ from xen.util import blkif
    85.4  import xen.util.xsm.xsm as security
    85.5  from xen.xend.XendError import VmError
    85.6  from xen.xend.server.DevController import DevController
    85.7 +from xen.util import xsconstants
    85.8  
    85.9  class BlkifController(DevController):
   85.10      """Block device interface controller. Handles all block devices
   85.11 @@ -72,7 +73,7 @@ class BlkifController(DevController):
   85.12          if uuid:
   85.13              back['uuid'] = uuid
   85.14  
   85.15 -        if security.on():
   85.16 +        if security.on() == xsconstants.XS_POLICY_ACM:
   85.17              self.do_access_control(config, uname)
   85.18  
   85.19          devid = blkif.blkdev_name_to_number(dev)
    86.1 --- a/tools/python/xen/xend/server/irqif.py	Fri Apr 25 20:13:52 2008 +0900
    86.2 +++ b/tools/python/xen/xend/server/irqif.py	Thu May 08 18:40:07 2008 +0900
    86.3 @@ -69,5 +69,10 @@ class IRQController(DevController):
    86.4              #todo non-fatal
    86.5              raise VmError(
    86.6                  'irq: Failed to configure irq: %d' % (pirq))
    86.7 -
    86.8 +        rc = xc.physdev_map_pirq(domid = self.getDomid(),
    86.9 +                                index = pirq,
   86.10 +                                pirq  = pirq)
   86.11 +        if rc < 0:
   86.12 +            raise VmError(
   86.13 +                'irq: Failed to map irq %x' % (pirq))
   86.14          return (None, {}, {})
    87.1 --- a/tools/python/xen/xend/server/netif.py	Fri Apr 25 20:13:52 2008 +0900
    87.2 +++ b/tools/python/xen/xend/server/netif.py	Thu May 08 18:40:07 2008 +0900
    87.3 @@ -29,6 +29,7 @@ from xen.xend.server.DevController impor
    87.4  from xen.xend.XendError import VmError
    87.5  from xen.xend.XendXSPolicyAdmin import XSPolicyAdminInstance
    87.6  import xen.util.xsm.xsm as security
    87.7 +from xen.util import xsconstants
    87.8  
    87.9  from xen.xend.XendLogging import log
   87.10  
   87.11 @@ -155,7 +156,7 @@ class NetifController(DevController):
   87.12              front = { 'handle' : "%i" % devid,
   87.13                        'mac'    : mac }
   87.14  
   87.15 -        if security.on():
   87.16 +        if security.on() == xsconstants.XS_POLICY_ACM:
   87.17              self.do_access_control(config)
   87.18  
   87.19          return (devid, back, front)
    88.1 --- a/tools/python/xen/xend/server/pciif.py	Fri Apr 25 20:13:52 2008 +0900
    88.2 +++ b/tools/python/xen/xend/server/pciif.py	Thu May 08 18:40:07 2008 +0900
    88.3 @@ -271,6 +271,25 @@ class PciController(DevController):
    88.4              if rc<0:
    88.5                  raise VmError(('pci: failed to configure I/O memory on device '+
    88.6                              '%s - errno=%d')%(dev.name,rc))
    88.7 +            rc = xc.physdev_map_pirq(domid = fe_domid,
    88.8 +                                   index = dev.irq,
    88.9 +                                   pirq  = dev.irq)
   88.10 +            if rc < 0:
   88.11 +                raise VmError(('pci: failed to map irq on device '+
   88.12 +                            '%s - errno=%d')%(dev.name,rc))
   88.13 +
   88.14 +        if dev.msix:
   88.15 +            for (start, size) in dev.msix_iomem:
   88.16 +                start_pfn = start>>PAGE_SHIFT
   88.17 +                nr_pfns = (size+(PAGE_SIZE-1))>>PAGE_SHIFT
   88.18 +                log.debug('pci-msix: remove permission for 0x%x/0x%x 0x%x/0x%x' % \
   88.19 +                         (start,size, start_pfn, nr_pfns))
   88.20 +                rc = xc.domain_iomem_permission(domid = fe_domid,
   88.21 +                                                first_pfn = start_pfn,
   88.22 +                                                nr_pfns = nr_pfns,
   88.23 +                                                allow_access = False)
   88.24 +                if rc<0:
   88.25 +                    raise VmError(('pci: failed to remove msi-x iomem'))
   88.26  
   88.27          if dev.irq>0:
   88.28              log.debug('pci: enabling irq %d'%dev.irq)
    89.1 --- a/tools/python/xen/xend/server/relocate.py	Fri Apr 25 20:13:52 2008 +0900
    89.2 +++ b/tools/python/xen/xend/server/relocate.py	Thu May 08 18:40:07 2008 +0900
    89.3 @@ -132,5 +132,14 @@ def listenRelocation():
    89.4          else:
    89.5              hosts_allow = map(re.compile, hosts_allow.split(" "))
    89.6  
    89.7 -        tcp.TCPListener(RelocationProtocol, port, interface = interface,
    89.8 -                        hosts_allow = hosts_allow)
    89.9 +        ssl_key_file = xoptions.get_xend_relocation_server_ssl_key_file()
   89.10 +        ssl_cert_file = xoptions.get_xend_relocation_server_ssl_cert_file()
   89.11 +
   89.12 +        if ssl_key_file and ssl_cert_file:
   89.13 +            tcp.SSLTCPListener(RelocationProtocol, port, interface = interface,
   89.14 +                               hosts_allow = hosts_allow,
   89.15 +                               ssl_key_file = ssl_key_file,
   89.16 +                               ssl_cert_file = ssl_cert_file)
   89.17 +        else:
   89.18 +            tcp.TCPListener(RelocationProtocol, port, interface = interface,
   89.19 +                            hosts_allow = hosts_allow)
    90.1 --- a/tools/python/xen/xm/addlabel.py	Fri Apr 25 20:13:52 2008 +0900
    90.2 +++ b/tools/python/xen/xm/addlabel.py	Thu May 08 18:40:07 2008 +0900
    90.3 @@ -205,17 +205,17 @@ def main(argv):
    90.4      policy_type = ""
    90.5      if len(argv) not in (4, 5):
    90.6          raise OptionError('Needs either 2 or 3 arguments')
    90.7 -    
    90.8 +
    90.9      label = argv[1]
   90.10 -    
   90.11 +
   90.12      if len(argv) == 5:
   90.13          policyref = argv[4]
   90.14 -    elif security.on():
   90.15 +    elif security.on() == xsconstants.XS_POLICY_ACM:
   90.16          policyref = security.active_policy
   90.17          policy_type = xsconstants.ACM_POLICY_ID
   90.18      else:
   90.19 -        raise OptionError("No active policy. Must specify policy on the "
   90.20 -                          "command line.")
   90.21 +        raise OptionError("ACM security is not enabled. You must specify "\
   90.22 +                          "the policy on the command line.")
   90.23  
   90.24      if argv[2].lower() == "dom":
   90.25          configfile = argv[3]
    91.1 --- a/tools/python/xen/xm/create.py	Fri Apr 25 20:13:52 2008 +0900
    91.2 +++ b/tools/python/xen/xm/create.py	Thu May 08 18:40:07 2008 +0900
    91.3 @@ -549,6 +549,14 @@ gopts.var('hap', val='HAP',
    91.4            use="""Hap status (0=hap is disabled;
    91.5            1=hap is enabled.""")
    91.6  
    91.7 +gopts.var('cpuid', val="IN[,SIN]:eax=EAX,ebx=EBX,exc=ECX,edx=EDX",
    91.8 +          fn=append_value, default=[],
    91.9 +          use="""Cpuid description.""")
   91.10 +
   91.11 +gopts.var('cpuid_check', val="IN[,SIN]:eax=EAX,ebx=EBX,exc=ECX,edx=EDX",
   91.12 +          fn=append_value, default=[],
   91.13 +          use="""Cpuid check description.""")
   91.14 +
   91.15  def err(msg):
   91.16      """Print an error to stderr and exit.
   91.17      """
   91.18 @@ -755,7 +763,7 @@ def configure_hvm(config_image, vals):
   91.19               'vnc', 'vncdisplay', 'vncunused', 'vncconsole', 'vnclisten',
   91.20               'sdl', 'display', 'xauthority', 'rtc_timeoffset', 'monitor',
   91.21               'acpi', 'apic', 'usb', 'usbdevice', 'keymap', 'pci', 'hpet',
   91.22 -             'guest_os_type', 'hap', 'opengl']
   91.23 +             'guest_os_type', 'hap', 'opengl', 'cpuid', 'cpuid_check']
   91.24  
   91.25      for a in args:
   91.26          if a in vals.__dict__ and vals.__dict__[a] is not None:
   91.27 @@ -779,7 +787,8 @@ def make_config(vals):
   91.28      map(add_conf, ['name', 'memory', 'maxmem', 'shadow_memory',
   91.29                     'restart', 'on_poweroff',
   91.30                     'on_reboot', 'on_crash', 'vcpus', 'vcpu_avail', 'features',
   91.31 -                   'on_xend_start', 'on_xend_stop', 'target'])
   91.32 +                   'on_xend_start', 'on_xend_stop', 'target', 'cpuid',
   91.33 +                   'cpuid_check'])
   91.34  
   91.35      if vals.uuid is not None:
   91.36          config.append(['uuid', vals.uuid])
   91.37 @@ -843,6 +852,26 @@ def preprocess_disk(vals):
   91.38          disk.append(d)
   91.39      vals.disk = disk
   91.40  
   91.41 +def preprocess_cpuid(vals, attr_name):
   91.42 +    if not vals.cpuid: return
   91.43 +    cpuid = {} 
   91.44 +    for cpuid_input in getattr(vals, attr_name):
   91.45 +        input_re = "(0x)?[0-9A-Fa-f]+(,(0x)?[0-9A-Fa-f]+)?"
   91.46 +        cpuid_match = re.match(r'(?P<input>%s):(?P<regs>.*)' % \
   91.47 +                               input_re, cpuid_input)
   91.48 +        if cpuid_match != None:
   91.49 +            res_cpuid = cpuid_match.groupdict()
   91.50 +            input = res_cpuid['input']
   91.51 +            regs = res_cpuid['regs'].split(',')
   91.52 +            cpuid[input]= {} # New input
   91.53 +            for reg in regs:
   91.54 +                reg_match = re.match(r"(?P<reg>eax|ebx|ecx|edx)=(?P<val>.*)", reg)
   91.55 +                if reg_match == None:
   91.56 +                    err("cpuid's syntax is (eax|ebx|ecx|edx)=value")
   91.57 +                res = reg_match.groupdict()
   91.58 +                cpuid[input][res['reg']] = res['val'] # new register
   91.59 +    setattr(vals, attr_name, cpuid)
   91.60 +
   91.61  def preprocess_pci(vals):
   91.62      if not vals.pci: return
   91.63      pci = []
   91.64 @@ -1047,6 +1076,8 @@ def preprocess(vals):
   91.65      preprocess_vnc(vals)
   91.66      preprocess_vtpm(vals)
   91.67      preprocess_access_control(vals)
   91.68 +    preprocess_cpuid(vals, 'cpuid')
   91.69 +    preprocess_cpuid(vals, 'cpuid_check')
   91.70  
   91.71  
   91.72  def comma_sep_kv_to_dict(c):
    92.1 --- a/tools/python/xen/xm/dry-run.py	Fri Apr 25 20:13:52 2008 +0900
    92.2 +++ b/tools/python/xen/xm/dry-run.py	Thu May 08 18:40:07 2008 +0900
    92.3 @@ -22,6 +22,7 @@ import sys
    92.4  import xen.util.xsm.xsm as security
    92.5  from xen.xm import create
    92.6  from xen.xend import sxp
    92.7 +from xen.util import xsconstants
    92.8  from xen.xm.opts import OptionError
    92.9  
   92.10  def help():
   92.11 @@ -40,7 +41,7 @@ def check_domain_label(config, verbose):
   92.12      answer = 0
   92.13      default_label = None
   92.14      secon = 0
   92.15 -    if security.on():
   92.16 +    if security.on() == xsconstants.XS_POLICY_ACM:
   92.17          default_label = security.ssidref2label(security.NULL_SSIDREF)
   92.18          secon = 1
   92.19  
   92.20 @@ -90,7 +91,7 @@ def config_security_check(config, verbos
   92.21              domain_policy = sxp.child_value(sxp.name(sxp.child0(x)), 'policy')
   92.22  
   92.23      # if no domain label, use default
   92.24 -    if not domain_label and security.on():
   92.25 +    if not domain_label and security.on() == xsconstants.XS_POLICY_ACM:
   92.26          try:
   92.27              domain_label = security.ssidref2label(security.NULL_SSIDREF)
   92.28          except:
    93.1 --- a/tools/python/xen/xm/main.py	Fri Apr 25 20:13:52 2008 +0900
    93.2 +++ b/tools/python/xen/xm/main.py	Thu May 08 18:40:07 2008 +0900
    93.3 @@ -133,7 +133,7 @@ SUBCOMMAND_HELP = {
    93.4                       'Read and/or clear Xend\'s message buffer.'),
    93.5      'domid'       : ('<DomainName>', 'Convert a domain name to domain id.'),
    93.6      'domname'     : ('<DomId>', 'Convert a domain id to domain name.'),
    93.7 -    'dump-core'   : ('[-L|--live] [-C|--crash] <Domain> [Filename]',
    93.8 +    'dump-core'   : ('[-L|--live] [-C|--crash] [-R|--reset] <Domain> [Filename]',
    93.9                       'Dump core for a specific domain.'),
   93.10      'info'        : ('[-c|--config]', 'Get information about Xen host.'),
   93.11      'log'         : ('', 'Print Xend log'),
   93.12 @@ -243,6 +243,7 @@ SUBCOMMAND_OPTIONS = {
   93.13      'dump-core': (
   93.14         ('-L', '--live', 'Dump core without pausing the domain'),
   93.15         ('-C', '--crash', 'Crash domain after dumping core'),
   93.16 +       ('-R', '--reset', 'Reset domain after dumping core'),
   93.17      ),
   93.18      'start': (
   93.19         ('-p', '--paused', 'Do not unpause domain after starting it'),
   93.20 @@ -417,10 +418,11 @@ server = None
   93.21  def cmdHelp(cmd):
   93.22      """Print help for a specific subcommand."""
   93.23      
   93.24 -    for fc in SUBCOMMAND_HELP.keys():
   93.25 -        if fc[:len(cmd)] == cmd:
   93.26 -            cmd = fc
   93.27 -            break
   93.28 +    if not SUBCOMMAND_HELP.has_key(cmd):
   93.29 +        for fc in SUBCOMMAND_HELP.keys():
   93.30 +            if fc[:len(cmd)] == cmd:
   93.31 +                cmd = fc
   93.32 +                break
   93.33      
   93.34      try:
   93.35          args, desc = SUBCOMMAND_HELP[cmd]
   93.36 @@ -1279,14 +1281,19 @@ def xm_unpause(args):
   93.37  def xm_dump_core(args):
   93.38      live = False
   93.39      crash = False
   93.40 +    reset = False
   93.41      try:
   93.42 -        (options, params) = getopt.gnu_getopt(args, 'LC', ['live','crash'])
   93.43 +        (options, params) = getopt.gnu_getopt(args, 'LCR', ['live', 'crash', 'reset'])
   93.44          for (k, v) in options:
   93.45              if k in ('-L', '--live'):
   93.46                  live = True
   93.47 -            if k in ('-C', '--crash'):
   93.48 +            elif k in ('-C', '--crash'):
   93.49                  crash = True
   93.50 +            elif k in ('-R', '--reset'):
   93.51 +                reset = True
   93.52  
   93.53 +        if crash and reset:
   93.54 +            raise OptionError("You may not specify more than one '-CR' option")
   93.55          if len(params) not in (1, 2):
   93.56              raise OptionError("Expects 1 or 2 argument(s)")
   93.57      except getopt.GetoptError, e:
   93.58 @@ -1308,8 +1315,11 @@ def xm_dump_core(args):
   93.59          if crash:
   93.60              print "Destroying domain: %s ..." % str(dom)
   93.61              server.xend.domain.destroy(dom)
   93.62 +        elif reset:
   93.63 +            print "Resetting domain: %s ..." % str(dom)
   93.64 +            server.xend.domain.reset(dom)
   93.65      finally:
   93.66 -        if not live and not crash and ds == DOM_STATE_RUNNING:
   93.67 +        if not live and not crash and not reset and ds == DOM_STATE_RUNNING:
   93.68              server.xend.domain.unpause(dom)
   93.69  
   93.70  def xm_rename(args):
    94.1 --- a/tools/python/xen/xm/migrate.py	Fri Apr 25 20:13:52 2008 +0900
    94.2 +++ b/tools/python/xen/xm/migrate.py	Thu May 08 18:40:07 2008 +0900
    94.3 @@ -47,10 +47,6 @@ gopts.opt('node', short='n', val='nodenu
    94.4            fn=set_int, default=-1,
    94.5            use="Use specified NUMA node on target.")
    94.6  
    94.7 -gopts.opt('resource', short='r', val='MBIT',
    94.8 -          fn=set_int, default=0,
    94.9 -          use="Set level of resource usage for migration.")
   94.10 -
   94.11  def help():
   94.12      return str(gopts)
   94.13      
   94.14 @@ -69,13 +65,11 @@ def main(argv):
   94.15          vm_ref = get_single_vm(dom)
   94.16          other_config = {
   94.17              "port":     opts.vals.port,
   94.18 -            "resource": opts.vals.resource,
   94.19              "node":     opts.vals.node
   94.20              }
   94.21          server.xenapi.VM.migrate(vm_ref, dst, bool(opts.vals.live),
   94.22                                   other_config)
   94.23      else:
   94.24          server.xend.domain.migrate(dom, dst, opts.vals.live,
   94.25 -                                   opts.vals.resource,
   94.26                                     opts.vals.port,
   94.27                                     opts.vals.node)
    95.1 --- a/tools/python/xen/xm/xenapi_create.py	Fri Apr 25 20:13:52 2008 +0900
    95.2 +++ b/tools/python/xen/xm/xenapi_create.py	Thu May 08 18:40:07 2008 +0900
    95.3 @@ -485,9 +485,9 @@ class xenapi_create:
    95.4                  vm_ref,
    95.5              "protocol":
    95.6                  console.attributes["protocol"].value,
    95.7 -            "other_params":
    95.8 +            "other_config":
    95.9                  get_child_nodes_as_dict(console,
   95.10 -                  "other_param", "key", "value")
   95.11 +                  "other_config", "key", "value")
   95.12              }
   95.13  
   95.14          return server.xenapi.console.create(console_record)
    96.1 --- a/tools/xenstore/xenstore_client.c	Fri Apr 25 20:13:52 2008 +0900
    96.2 +++ b/tools/xenstore/xenstore_client.c	Thu May 08 18:40:07 2008 +0900
    96.3 @@ -450,8 +450,9 @@ static enum mode lookup_mode(const char 
    96.4  	return MODE_write;
    96.5      else if (strcmp(m, "read") == 0)
    96.6  	return MODE_read;
    96.7 -    else
    96.8 -	errx(1, "unknown mode %s\n", m);
    96.9 +
   96.10 +    errx(1, "unknown mode %s\n", m);
   96.11 +    return 0;
   96.12  }
   96.13  
   96.14  int
    97.1 --- a/tools/xenstore/xenstored_core.c	Fri Apr 25 20:13:52 2008 +0900
    97.2 +++ b/tools/xenstore/xenstored_core.c	Thu May 08 18:40:07 2008 +0900
    97.3 @@ -1929,7 +1929,7 @@ int main(int argc, char *argv[])
    97.4  
    97.5  	/* Main loop. */
    97.6  	for (;;) {
    97.7 -		struct connection *conn, *old_conn;
    97.8 +		struct connection *conn, *next;
    97.9  
   97.10  		if (select(max+1, &inset, &outset, NULL, timeout) < 0) {
   97.11  			if (errno == EINTR)
   97.12 @@ -1953,27 +1953,39 @@ int main(int argc, char *argv[])
   97.13  		if (evtchn_fd != -1 && FD_ISSET(evtchn_fd, &inset))
   97.14  			handle_event();
   97.15  
   97.16 -		conn = list_entry(connections.next, typeof(*conn), list);
   97.17 -		while (&conn->list != &connections) {
   97.18 -			talloc_increase_ref_count(conn);
   97.19 +		next = list_entry(connections.next, typeof(*conn), list);
   97.20 +		while (&next->list != &connections) {
   97.21 +			conn = next;
   97.22 +
   97.23 +			next = list_entry(conn->list.next,
   97.24 +					  typeof(*conn), list);
   97.25  
   97.26  			if (conn->domain) {
   97.27 +				talloc_increase_ref_count(conn);
   97.28  				if (domain_can_read(conn))
   97.29  					handle_input(conn);
   97.30 +				if (talloc_free(conn) == 0)
   97.31 +					continue;
   97.32 +
   97.33 +				talloc_increase_ref_count(conn);
   97.34  				if (domain_can_write(conn) &&
   97.35  				    !list_empty(&conn->out_list))
   97.36  					handle_output(conn);
   97.37 +				if (talloc_free(conn) == 0)
   97.38 +					continue;
   97.39  			} else {
   97.40 +				talloc_increase_ref_count(conn);
   97.41  				if (FD_ISSET(conn->fd, &inset))
   97.42  					handle_input(conn);
   97.43 +				if (talloc_free(conn) == 0)
   97.44 +					continue;
   97.45 +
   97.46 +				talloc_increase_ref_count(conn);
   97.47  				if (FD_ISSET(conn->fd, &outset))
   97.48  					handle_output(conn);
   97.49 +				if (talloc_free(conn) == 0)
   97.50 +					continue;
   97.51  			}
   97.52 -
   97.53 -			old_conn = conn;
   97.54 -			conn = list_entry(old_conn->list.next,
   97.55 -					  typeof(*conn), list);
   97.56 -			talloc_free(old_conn);
   97.57  		}
   97.58  
   97.59  		max = initialize_set(&inset, &outset, *sock, *ro_sock,
    98.1 --- a/xen/Makefile	Fri Apr 25 20:13:52 2008 +0900
    98.2 +++ b/xen/Makefile	Thu May 08 18:40:07 2008 +0900
    98.3 @@ -6,6 +6,9 @@ export XEN_EXTRAVERSION ?= -unstable$(XE
    98.4  export XEN_FULLVERSION   = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
    98.5  -include xen-version
    98.6  
    98.7 +export XEN_WHOAMI	?= $(USER)
    98.8 +export XEN_DOMAIN	?= $(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))
    98.9 +
   98.10  export BASEDIR := $(CURDIR)
   98.11  
   98.12  .PHONY: default
   98.13 @@ -81,8 +84,8 @@ delete-unfresh-files:
   98.14  include/xen/compile.h: include/xen/compile.h.in .banner
   98.15  	@sed -e 's/@@date@@/$(shell LC_ALL=C date)/g' \
   98.16  	    -e 's/@@time@@/$(shell LC_ALL=C date +%T)/g' \
   98.17 -	    -e 's/@@whoami@@/$(USER)/g' \
   98.18 -	    -e 's/@@domain@@/$(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))/g' \
   98.19 +	    -e 's/@@whoami@@/$(XEN_WHOAMI)/g' \
   98.20 +	    -e 's/@@domain@@/$(XEN_DOMAIN)/g' \
   98.21  	    -e 's/@@hostname@@/$(shell hostname)/g' \
   98.22  	    -e 's!@@compiler@@!$(shell $(CC) $(CFLAGS) -v 2>&1 | grep -i "gcc.*version")!g' \
   98.23  	    -e 's/@@version@@/$(XEN_VERSION)/g' \
    99.1 --- a/xen/arch/ia64/vmx/vmx_hypercall.c	Fri Apr 25 20:13:52 2008 +0900
    99.2 +++ b/xen/arch/ia64/vmx/vmx_hypercall.c	Thu May 08 18:40:07 2008 +0900
    99.3 @@ -200,6 +200,10 @@ do_hvm_op(unsigned long op, XEN_GUEST_HA
    99.4          rc = 0;
    99.5          break;
    99.6  
    99.7 +    case HVMOP_track_dirty_vram:
    99.8 +        rc = -ENOSYS;
    99.9 +        break;
   99.10 +
   99.11      default:
   99.12          gdprintk(XENLOG_INFO, "Bad HVM op %ld.\n", op);
   99.13          rc = -ENOSYS;
   100.1 --- a/xen/arch/x86/Makefile	Fri Apr 25 20:13:52 2008 +0900
   100.2 +++ b/xen/arch/x86/Makefile	Thu May 08 18:40:07 2008 +0900
   100.3 @@ -24,6 +24,7 @@ obj-y += platform_hypercall.o
   100.4  obj-y += i387.o
   100.5  obj-y += i8259.o
   100.6  obj-y += io_apic.o
   100.7 +obj-y += msi.o
   100.8  obj-y += ioport_emulate.o
   100.9  obj-y += irq.o
  100.10  obj-y += microcode.o
   101.1 --- a/xen/arch/x86/acpi/Makefile	Fri Apr 25 20:13:52 2008 +0900
   101.2 +++ b/xen/arch/x86/acpi/Makefile	Thu May 08 18:40:07 2008 +0900
   101.3 @@ -1,2 +1,2 @@
   101.4  obj-y += boot.o
   101.5 -obj-y += power.o suspend.o wakeup_prot.o
   101.6 +obj-y += power.o suspend.o wakeup_prot.o cpu_idle.o
   102.1 --- a/xen/arch/x86/acpi/boot.c	Fri Apr 25 20:13:52 2008 +0900
   102.2 +++ b/xen/arch/x86/acpi/boot.c	Thu May 08 18:40:07 2008 +0900
   102.3 @@ -462,6 +462,28 @@ bad:
   102.4  }
   102.5  #endif
   102.6  
   102.7 +static void __init
   102.8 +acpi_fadt_parse_reg(struct acpi_table_fadt *fadt)
   102.9 +{
  102.10 +	unsigned int len;
  102.11 +
  102.12 +	len = min_t(unsigned int, fadt->header.length, sizeof(*fadt));
  102.13 +	memcpy(&acpi_gbl_FADT, fadt, len);
  102.14 +
  102.15 +	if (len > offsetof(struct acpi_table_fadt, xpm1b_event_block)) {
  102.16 +		memcpy(&acpi_gbl_xpm1a_enable, &fadt->xpm1a_event_block,
  102.17 +		       sizeof(acpi_gbl_xpm1a_enable));
  102.18 +		memcpy(&acpi_gbl_xpm1b_enable, &fadt->xpm1b_event_block,
  102.19 +		       sizeof(acpi_gbl_xpm1b_enable));
  102.20 +
  102.21 +		acpi_gbl_xpm1a_enable.address +=
  102.22 +			acpi_gbl_FADT.pm1_event_length / 2;
  102.23 +		if ( acpi_gbl_xpm1b_enable.address )
  102.24 +			acpi_gbl_xpm1b_enable.address +=
  102.25 +				acpi_gbl_FADT.pm1_event_length / 2;
  102.26 +	}
  102.27 +}
  102.28 +
  102.29  static int __init acpi_parse_fadt(unsigned long phys, unsigned long size)
  102.30  {
  102.31  	struct acpi_table_fadt *fadt = NULL;
  102.32 @@ -509,6 +531,8 @@ static int __init acpi_parse_fadt(unsign
  102.33  	acpi_enable_value  = fadt->acpi_enable;
  102.34  	acpi_disable_value = fadt->acpi_disable;
  102.35  
  102.36 +	acpi_fadt_parse_reg(fadt);
  102.37 +
  102.38  #ifdef CONFIG_ACPI_SLEEP
  102.39  	acpi_fadt_parse_sleep_info(fadt);
  102.40  #endif
   103.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
   103.2 +++ b/xen/arch/x86/acpi/cpu_idle.c	Thu May 08 18:40:07 2008 +0900
   103.3 @@ -0,0 +1,957 @@
   103.4 +/*
   103.5 + * cpu_idle - xen idle state module derived from Linux 
   103.6 + *            drivers/acpi/processor_idle.c & 
   103.7 + *            arch/x86/kernel/acpi/cstate.c
   103.8 + *
   103.9 + *  Copyright (C) 2001, 2002 Andy Grover <andrew.grover@intel.com>
  103.10 + *  Copyright (C) 2001, 2002 Paul Diefenbaugh <paul.s.diefenbaugh@intel.com>
  103.11 + *  Copyright (C) 2004, 2005 Dominik Brodowski <linux@brodo.de>
  103.12 + *  Copyright (C) 2004  Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
  103.13 + *                      - Added processor hotplug support
  103.14 + *  Copyright (C) 2005  Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
  103.15 + *                      - Added support for C3 on SMP
  103.16 + *  Copyright (C) 2007, 2008 Intel Corporation
  103.17 + *
  103.18 + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  103.19 + *
  103.20 + *  This program is free software; you can redistribute it and/or modify
  103.21 + *  it under the terms of the GNU General Public License as published by
  103.22 + *  the Free Software Foundation; either version 2 of the License, or (at
  103.23 + *  your option) any later version.
  103.24 + *
  103.25 + *  This program is distributed in the hope that it will be useful, but
  103.26 + *  WITHOUT ANY WARRANTY; without even the implied warranty of
  103.27 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  103.28 + *  General Public License for more details.
  103.29 + *
  103.30 + *  You should have received a copy of the GNU General Public License along
  103.31 + *  with this program; if not, write to the Free Software Foundation, Inc.,
  103.32 + *  59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.
  103.33 + *
  103.34 + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  103.35 + */
  103.36 +
  103.37 +#include <xen/config.h>
  103.38 +#include <xen/errno.h>
  103.39 +#include <xen/lib.h>
  103.40 +#include <xen/types.h>
  103.41 +#include <xen/acpi.h>
  103.42 +#include <xen/smp.h>
  103.43 +#include <asm/cache.h>
  103.44 +#include <asm/io.h>
  103.45 +#include <xen/guest_access.h>
  103.46 +#include <public/platform.h>
  103.47 +#include <asm/processor.h>
  103.48 +#include <xen/keyhandler.h>
  103.49 +
  103.50 +#define DEBUG_PM_CX
  103.51 +
  103.52 +#define US_TO_PM_TIMER_TICKS(t)     ((t * (PM_TIMER_FREQUENCY/1000)) / 1000)
  103.53 +#define C2_OVERHEAD         4   /* 1us (3.579 ticks per us) */
  103.54 +#define C3_OVERHEAD         4   /* 1us (3.579 ticks per us) */
  103.55 +
  103.56 +#define ACPI_PROCESSOR_MAX_POWER        8
  103.57 +#define ACPI_PROCESSOR_MAX_C2_LATENCY   100
  103.58 +#define ACPI_PROCESSOR_MAX_C3_LATENCY   1000
  103.59 +
  103.60 +extern u32 pmtmr_ioport;
  103.61 +extern void (*pm_idle) (void);
  103.62 +
  103.63 +static void (*pm_idle_save) (void) __read_mostly;
  103.64 +unsigned int max_cstate __read_mostly = 2;
  103.65 +integer_param("max_cstate", max_cstate);
  103.66 +/*
  103.67 + * bm_history -- bit-mask with a bit per jiffy of bus-master activity
  103.68 + * 1000 HZ: 0xFFFFFFFF: 32 jiffies = 32ms
  103.69 + * 800 HZ: 0xFFFFFFFF: 32 jiffies = 40ms
  103.70 + * 100 HZ: 0x0000000F: 4 jiffies = 40ms
  103.71 + * reduce history for more aggressive entry into C3
  103.72 + */
  103.73 +unsigned int bm_history __read_mostly =
  103.74 +    (HZ >= 800 ? 0xFFFFFFFF : ((1U << (HZ / 25)) - 1));
  103.75 +integer_param("bm_history", bm_history);
  103.76 +
  103.77 +struct acpi_processor_cx;
  103.78 +
  103.79 +struct acpi_processor_cx_policy
  103.80 +{
  103.81 +    u32 count;
  103.82 +    struct acpi_processor_cx *state;
  103.83 +    struct
  103.84 +    {
  103.85 +        u32 time;
  103.86 +        u32 ticks;
  103.87 +        u32 count;
  103.88 +        u32 bm;
  103.89 +    } threshold;
  103.90 +};
  103.91 +
  103.92 +struct acpi_processor_cx
  103.93 +{
  103.94 +    u8 valid;
  103.95 +    u8 type;
  103.96 +    u32 address;
  103.97 +    u8 space_id;
  103.98 +    u32 latency;
  103.99 +    u32 latency_ticks;
 103.100 +    u32 power;
 103.101 +    u32 usage;
 103.102 +    u64 time;
 103.103 +    struct acpi_processor_cx_policy promotion;
 103.104 +    struct acpi_processor_cx_policy demotion;
 103.105 +};
 103.106 +
 103.107 +struct acpi_processor_flags
 103.108 +{
 103.109 +    u8 bm_control:1;
 103.110 +    u8 bm_check:1;
 103.111 +    u8 has_cst:1;
 103.112 +    u8 power_setup_done:1;
 103.113 +    u8 bm_rld_set:1;
 103.114 +};
 103.115 +
 103.116 +struct acpi_processor_power
 103.117 +{
 103.118 +    struct acpi_processor_flags flags;
 103.119 +    struct acpi_processor_cx *state;
 103.120 +    s_time_t bm_check_timestamp;
 103.121 +    u32 default_state;
 103.122 +    u32 bm_activity;
 103.123 +    u32 count;
 103.124 +    struct acpi_processor_cx states[ACPI_PROCESSOR_MAX_POWER];
 103.125 +};
 103.126 +
 103.127 +static struct acpi_processor_power processor_powers[NR_CPUS];
 103.128 +
 103.129 +static void print_acpi_power(uint32_t cpu, struct acpi_processor_power *power)
 103.130 +{
 103.131 +    uint32_t i;
 103.132 +
 103.133 +    printk("saved cpu%d cx acpi info:\n", cpu);
 103.134 +    printk("\tcurrent state is C%d\n", (power->state)?power->state->type:-1);
 103.135 +    printk("\tbm_check_timestamp = %"PRId64"\n", power->bm_check_timestamp);
 103.136 +    printk("\tdefault_state = %d\n", power->default_state);
 103.137 +    printk("\tbm_activity = 0x%08x\n", power->bm_activity);
 103.138 +    printk("\tcount = %d\n", power->count);
 103.139 +    
 103.140 +    for ( i = 0; i < power->count; i++ )
 103.141 +    {
 103.142 +        printk("\tstates[%d]:\n", i);
 103.143 +        printk("\t\tvalid   = %d\n", power->states[i].valid);
 103.144 +        printk("\t\ttype    = %d\n", power->states[i].type);
 103.145 +        printk("\t\taddress = 0x%x\n", power->states[i].address);
 103.146 +        printk("\t\tspace_id = 0x%x\n", power->states[i].space_id);
 103.147 +        printk("\t\tlatency = %d\n", power->states[i].latency);
 103.148 +        printk("\t\tpower   = %d\n", power->states[i].power);
 103.149 +        printk("\t\tlatency_ticks = %d\n", power->states[i].latency_ticks);
 103.150 +        printk("\t\tusage   = %d\n", power->states[i].usage);
 103.151 +        printk("\t\ttime    = %"PRId64"\n", power->states[i].time);
 103.152 +
 103.153 +        printk("\t\tpromotion policy:\n");
 103.154 +        printk("\t\t\tcount    = %d\n", power->states[i].promotion.count);
 103.155 +        printk("\t\t\tstate    = C%d\n",
 103.156 +               (power->states[i].promotion.state) ? 
 103.157 +               power->states[i].promotion.state->type : -1);
 103.158 +        printk("\t\t\tthreshold.time = %d\n", power->states[i].promotion.threshold.time);
 103.159 +        printk("\t\t\tthreshold.ticks = %d\n", power->states[i].promotion.threshold.ticks);
 103.160 +        printk("\t\t\tthreshold.count = %d\n", power->states[i].promotion.threshold.count);
 103.161 +        printk("\t\t\tthreshold.bm = %d\n", power->states[i].promotion.threshold.bm);
 103.162 +
 103.163 +        printk("\t\tdemotion policy:\n");
 103.164 +        printk("\t\t\tcount    = %d\n", power->states[i].demotion.count);
 103.165 +        printk("\t\t\tstate    = C%d\n",
 103.166 +               (power->states[i].demotion.state) ? 
 103.167 +               power->states[i].demotion.state->type : -1);
 103.168 +        printk("\t\t\tthreshold.time = %d\n", power->states[i].demotion.threshold.time);
 103.169 +        printk("\t\t\tthreshold.ticks = %d\n", power->states[i].demotion.threshold.ticks);
 103.170 +        printk("\t\t\tthreshold.count = %d\n", power->states[i].demotion.threshold.count);
 103.171 +        printk("\t\t\tthreshold.bm = %d\n", power->states[i].demotion.threshold.bm);
 103.172 +    }
 103.173 +}
 103.174 +
 103.175 +static void dump_cx(unsigned char key)
 103.176 +{
 103.177 +    for( int i = 0; i < num_online_cpus(); i++ )
 103.178 +        print_acpi_power(i, &processor_powers[i]);
 103.179 +}
 103.180 +
 103.181 +static int __init cpu_idle_key_init(void)
 103.182 +{
 103.183 +    register_keyhandler(
 103.184 +        'c', dump_cx,        "dump cx structures");
 103.185 +    return 0;
 103.186 +}
 103.187 +__initcall(cpu_idle_key_init);
 103.188 +
 103.189 +static inline u32 ticks_elapsed(u32 t1, u32 t2)
 103.190 +{
 103.191 +    if ( t2 >= t1 )
 103.192 +        return (t2 - t1);
 103.193 +    else
 103.194 +        return ((0xFFFFFFFF - t1) + t2);
 103.195 +}
 103.196 +
 103.197 +static void acpi_processor_power_activate(struct acpi_processor_power *power,
 103.198 +                                          struct acpi_processor_cx *new)
 103.199 +{
 103.200 +    struct acpi_processor_cx *old;
 103.201 +
 103.202 +    if ( !power || !new )
 103.203 +        return;
 103.204 +
 103.205 +    old = power->state;
 103.206 +
 103.207 +    if ( old )
 103.208 +        old->promotion.count = 0;
 103.209 +    new->demotion.count = 0;
 103.210 +
 103.211 +    /* Cleanup from old state. */
 103.212 +    if ( old )
 103.213 +    {
 103.214 +        switch ( old->type )
 103.215 +        {
 103.216 +        case ACPI_STATE_C3:
 103.217 +            /* Disable bus master reload */
 103.218 +            if ( new->type != ACPI_STATE_C3 && power->flags.bm_check )
 103.219 +                acpi_set_register(ACPI_BITREG_BUS_MASTER_RLD, 0);
 103.220 +            break;
 103.221 +        }
 103.222 +    }
 103.223 +
 103.224 +    /* Prepare to use new state. */
 103.225 +    switch ( new->type )
 103.226 +    {
 103.227 +    case ACPI_STATE_C3:
 103.228 +        /* Enable bus master reload */
 103.229 +        if ( old->type != ACPI_STATE_C3 && power->flags.bm_check )
 103.230 +            acpi_set_register(ACPI_BITREG_BUS_MASTER_RLD, 1);
 103.231 +        break;
 103.232 +    }
 103.233 +
 103.234 +    power->state = new;
 103.235 +
 103.236 +    return;
 103.237 +}
 103.238 +
 103.239 +static void acpi_safe_halt(void)
 103.240 +{
 103.241 +    smp_mb__after_clear_bit();
 103.242 +    safe_halt();
 103.243 +}
 103.244 +
 103.245 +#define MWAIT_ECX_INTERRUPT_BREAK   (0x1)
 103.246 +
 103.247 +static void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 103.248 +{
 103.249 +    __monitor((void *)current, 0, 0);
 103.250 +    smp_mb();
 103.251 +    __mwait(eax, ecx);
 103.252 +}
 103.253 +
 103.254 +static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
 103.255 +{
 103.256 +    mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK);
 103.257 +}
 103.258 +
 103.259 +static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
 103.260 +{
 103.261 +    if ( cx->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE )
 103.262 +    {
 103.263 +        /* Call into architectural FFH based C-state */
 103.264 +        acpi_processor_ffh_cstate_enter(cx);
 103.265 +    }
 103.266 +    else
 103.267 +    {
 103.268 +        int unused;
 103.269 +        /* IO port based C-state */
 103.270 +        inb(cx->address);
 103.271 +        /* Dummy wait op - must do something useless after P_LVL2 read
 103.272 +           because chipsets cannot guarantee that STPCLK# signal
 103.273 +           gets asserted in time to freeze execution properly. */
 103.274 +        unused = inl(pmtmr_ioport);
 103.275 +    }
 103.276 +}
 103.277 +
 103.278 +static atomic_t c3_cpu_count;
 103.279 +
 103.280 +static void acpi_processor_idle(void)
 103.281 +{
 103.282 +    struct acpi_processor_power *power = NULL;
 103.283 +    struct acpi_processor_cx *cx = NULL;
 103.284 +    struct acpi_processor_cx *next_state = NULL;
 103.285 +    int sleep_ticks = 0;
 103.286 +    u32 t1, t2 = 0;
 103.287 +
 103.288 +    power = &processor_powers[smp_processor_id()];
 103.289 +
 103.290 +    /*
 103.291 +     * Interrupts must be disabled during bus mastering calculations and
 103.292 +     * for C2/C3 transitions.
 103.293 +     */
 103.294 +    local_irq_disable();
 103.295 +    cx = power->state;
 103.296 +    if ( !cx )
 103.297 +    {
 103.298 +        if ( pm_idle_save )
 103.299 +        {
 103.300 +            printk(XENLOG_DEBUG "call pm_idle_save()\n");
 103.301 +            pm_idle_save();
 103.302 +        }
 103.303 +        else
 103.304 +        {
 103.305 +            printk(XENLOG_DEBUG "call acpi_safe_halt()\n");
 103.306 +            acpi_safe_halt();
 103.307 +        }
 103.308 +        return;
 103.309 +    }
 103.310 +
 103.311 +    /*
 103.312 +     * Check BM Activity
 103.313 +     * -----------------
 103.314 +     * Check for bus mastering activity (if required), record, and check
 103.315 +     * for demotion.
 103.316 +     */
 103.317 +    if ( power->flags.bm_check )
 103.318 +    {
 103.319 +        u32 bm_status = 0;
 103.320 +        unsigned long diff = (NOW() - power->bm_check_timestamp) >> 23;
 103.321 +
 103.322 +        if ( diff > 31 )
 103.323 +            diff = 31;
 103.324 +
 103.325 +        power->bm_activity <<= diff;
 103.326 +
 103.327 +        acpi_get_register(ACPI_BITREG_BUS_MASTER_STATUS, &bm_status);
 103.328 +        if ( bm_status )
 103.329 +        {
 103.330 +            power->bm_activity |= 0x1;
 103.331 +            acpi_set_register(ACPI_BITREG_BUS_MASTER_STATUS, 1);
 103.332 +        }
 103.333 +        /*
 103.334 +         * PIIX4 Erratum #18: Note that BM_STS doesn't always reflect
 103.335 +         * the true state of bus mastering activity; forcing us to
 103.336 +         * manually check the BMIDEA bit of each IDE channel.
 103.337 +         */
 103.338 +        /*else if ( errata.piix4.bmisx )
 103.339 +        {
 103.340 +            if ( (inb_p(errata.piix4.bmisx + 0x02) & 0x01)
 103.341 +                || (inb_p(errata.piix4.bmisx + 0x0A) & 0x01) )
 103.342 +                pr->power.bm_activity |= 0x1;
 103.343 +        }*/
 103.344 +
 103.345 +        power->bm_check_timestamp = NOW();
 103.346 +
 103.347 +        /*
 103.348 +         * If bus mastering is or was active this jiffy, demote
 103.349 +         * to avoid a faulty transition.  Note that the processor
 103.350 +         * won't enter a low-power state during this call (to this
 103.351 +         * function) but should upon the next.
 103.352 +         *
 103.353 +         * TBD: A better policy might be to fallback to the demotion
 103.354 +         *      state (use it for this quantum only) istead of
 103.355 +         *      demoting -- and rely on duration as our sole demotion
 103.356 +         *      qualification.  This may, however, introduce DMA
 103.357 +         *      issues (e.g. floppy DMA transfer overrun/underrun).
 103.358 +         */
 103.359 +        if ( (power->bm_activity & 0x1) && cx->demotion.threshold.bm )
 103.360 +        {
 103.361 +            local_irq_enable();
 103.362 +            next_state = cx->demotion.state;
 103.363 +            goto end;
 103.364 +        }
 103.365 +    }
 103.366 +
 103.367 +    /*
 103.368 +     * Sleep:
 103.369 +     * ------
 103.370 +     * Invoke the current Cx state to put the processor to sleep.
 103.371 +     */
 103.372 +    if ( cx->type == ACPI_STATE_C2 || cx->type == ACPI_STATE_C3 )
 103.373 +        smp_mb__after_clear_bit();
 103.374 +
 103.375 +    switch ( cx->type )
 103.376 +    {
 103.377 +    case ACPI_STATE_C1:
 103.378 +        /*
 103.379 +         * Invoke C1.
 103.380 +         * Use the appropriate idle routine, the one that would
 103.381 +         * be used without acpi C-states.
 103.382 +         */
 103.383 +        if ( pm_idle_save )
 103.384 +            pm_idle_save();
 103.385 +        else 
 103.386 +            acpi_safe_halt();
 103.387 +
 103.388 +        /*
 103.389 +         * TBD: Can't get time duration while in C1, as resumes
 103.390 +         *      go to an ISR rather than here.  Need to instrument
 103.391 +         *      base interrupt handler.
 103.392 +         */
 103.393 +        sleep_ticks = 0xFFFFFFFF;
 103.394 +        break;
 103.395 +
 103.396 +    case ACPI_STATE_C2:
 103.397 +        /* Get start time (ticks) */
 103.398 +        t1 = inl(pmtmr_ioport);
 103.399 +        /* Invoke C2 */
 103.400 +        acpi_idle_do_entry(cx);
 103.401 +        /* Get end time (ticks) */
 103.402 +        t2 = inl(pmtmr_ioport);
 103.403 +
 103.404 +        /* Re-enable interrupts */
 103.405 +        local_irq_enable();
 103.406 +        /* Compute time (ticks) that we were actually asleep */
 103.407 +        sleep_ticks =
 103.408 +            ticks_elapsed(t1, t2) - cx->latency_ticks - C2_OVERHEAD;
 103.409 +        break;
 103.410 +
 103.411 +    case ACPI_STATE_C3:
 103.412 +        /*
 103.413 +         * disable bus master
 103.414 +         * bm_check implies we need ARB_DIS
 103.415 +         * !bm_check implies we need cache flush
 103.416 +         * bm_control implies whether we can do ARB_DIS
 103.417 +         *
 103.418 +         * That leaves a case where bm_check is set and bm_control is
 103.419 +         * not set. In that case we cannot do much, we enter C3
 103.420 +         * without doing anything.
 103.421 +         */
 103.422 +        if ( power->flags.bm_check && power->flags.bm_control )
 103.423 +        {
 103.424 +            atomic_inc(&c3_cpu_count);
 103.425 +            if ( atomic_read(&c3_cpu_count) == num_online_cpus() )
 103.426 +            {
 103.427 +                /*
 103.428 +                 * All CPUs are trying to go to C3
 103.429 +                 * Disable bus master arbitration
 103.430 +                 */
 103.431 +                acpi_set_register(ACPI_BITREG_ARB_DISABLE, 1);
 103.432 +            }
 103.433 +        }
 103.434 +        else if ( !power->flags.bm_check )
 103.435 +        {
 103.436 +            /* SMP with no shared cache... Invalidate cache  */
 103.437 +            ACPI_FLUSH_CPU_CACHE();
 103.438 +        }
 103.439 +
 103.440 +        /* Get start time (ticks) */
 103.441 +        t1 = inl(pmtmr_ioport);
 103.442 +
 103.443 +        /*
 103.444 +         * FIXME: Before invoking C3, be aware that TSC/APIC timer may be 
 103.445 +         * stopped by H/W. Without carefully handling of TSC/APIC stop issues,
 103.446 +         * deep C state can't work correctly.
 103.447 +         */
 103.448 +        /* preparing TSC stop */
 103.449 +        cstate_save_tsc();
 103.450 +        /* placeholder for preparing APIC stop */
 103.451 +
 103.452 +        /* Invoke C3 */
 103.453 +        acpi_idle_do_entry(cx);
 103.454 +
 103.455 +        /* placeholder for recovering APIC */
 103.456 +
 103.457 +        /* recovering TSC */
 103.458 +        cstate_restore_tsc();
 103.459 +
 103.460 +        /* Get end time (ticks) */
 103.461 +        t2 = inl(pmtmr_ioport);
 103.462 +        if ( power->flags.bm_check && power->flags.bm_control )
 103.463 +        {
 103.464 +            /* Enable bus master arbitration */
 103.465 +            atomic_dec(&c3_cpu_count);
 103.466 +            acpi_set_register(ACPI_BITREG_ARB_DISABLE, 0);
 103.467 +        }
 103.468 +
 103.469 +        /* Compute time (ticks) that we were actually asleep */
 103.470 +        sleep_ticks = ticks_elapsed(t1, t2);
 103.471 +        /* Re-enable interrupts */
 103.472 +        local_irq_enable();
 103.473 +        /* Do not account our idle-switching overhead: */
 103.474 +        sleep_ticks -= cx->latency_ticks + C3_OVERHEAD;
 103.475 +
 103.476 +        break;
 103.477 +
 103.478 +    default:
 103.479 +        local_irq_enable();
 103.480 +        return;
 103.481 +    }
 103.482 +
 103.483 +    cx->usage++;
 103.484 +    if ( (cx->type != ACPI_STATE_C1) && (sleep_ticks > 0) )
 103.485 +        cx->time += sleep_ticks;
 103.486 +
 103.487 +    next_state = power->state;
 103.488 +
 103.489 +    /*
 103.490 +     * Promotion?
 103.491 +     * ----------
 103.492 +     * Track the number of longs (time asleep is greater than threshold)
 103.493 +     * and promote when the count threshold is reached.  Note that bus
 103.494 +     * mastering activity may prevent promotions.
 103.495 +     * Do not promote above max_cstate.
 103.496 +     */
 103.497 +    if ( cx->promotion.state &&
 103.498 +         ((cx->promotion.state - power->states) <= max_cstate) )
 103.499 +    {
 103.500 +        if ( sleep_ticks > cx->promotion.threshold.ticks )
 103.501 +        {
 103.502 +            cx->promotion.count++;
 103.503 +            cx->demotion.count = 0;
 103.504 +            if ( cx->promotion.count >= cx->promotion.threshold.count )
 103.505 +            {
 103.506 +                if ( power->flags.bm_check )
 103.507 +                {
 103.508 +                    if ( !(power->bm_activity & cx->promotion.threshold.bm) )
 103.509 +                    {
 103.510 +                        next_state = cx->promotion.state;
 103.511 +                        goto end;
 103.512 +                    }
 103.513 +                }
 103.514 +                else
 103.515 +                {
 103.516 +                    next_state = cx->promotion.state;
 103.517 +                    goto end;
 103.518 +                }
 103.519 +            }
 103.520 +        }
 103.521 +    }
 103.522 +
 103.523 +    /*
 103.524 +     * Demotion?
 103.525 +     * ---------
 103.526 +     * Track the number of shorts (time asleep is less than time threshold)
 103.527 +     * and demote when the usage threshold is reached.
 103.528 +     */
 103.529 +    if ( cx->demotion.state )
 103.530 +    {
 103.531 +        if ( sleep_ticks < cx->demotion.threshold.ticks )
 103.532 +        {
 103.533 +            cx->demotion.count++;
 103.534 +            cx->promotion.count = 0;
 103.535 +            if ( cx->demotion.count >= cx->demotion.threshold.count )
 103.536 +            {
 103.537 +                next_state = cx->demotion.state;
 103.538 +                goto end;
 103.539 +            }
 103.540 +        }
 103.541 +    }
 103.542 +
 103.543 +end:
 103.544 +    /*
 103.545 +     * Demote if current state exceeds max_cstate
 103.546 +     */
 103.547 +    if ( (power->state - power->states) > max_cstate )
 103.548 +    {
 103.549 +        if ( cx->demotion.state )
 103.550 +            next_state = cx->demotion.state;
 103.551 +    }
 103.552 +
 103.553 +    /*
 103.554 +     * New Cx State?
 103.555 +     * -------------
 103.556 +     * If we're going to start using a new Cx state we must clean up
 103.557 +     * from the previous and prepare to use the new.
 103.558 +     */
 103.559 +    if ( next_state != power->state )
 103.560 +        acpi_processor_power_activate(power, next_state);
 103.561 +}
 103.562 +
 103.563 +static int acpi_processor_set_power_policy(struct acpi_processor_power *power)
 103.564 +{
 103.565 +    unsigned int i;
 103.566 +    unsigned int state_is_set = 0;
 103.567 +    struct acpi_processor_cx *lower = NULL;
 103.568 +    struct acpi_processor_cx *higher = NULL;
 103.569 +    struct acpi_processor_cx *cx;
 103.570 +
 103.571 +    if ( !power )
 103.572 +        return -EINVAL;
 103.573 +
 103.574 +    /*
 103.575 +     * This function sets the default Cx state policy (OS idle handler).
 103.576 +     * Our scheme is to promote quickly to C2 but more conservatively
 103.577 +     * to C3.  We're favoring C2  for its characteristics of low latency
 103.578 +     * (quick response), good power savings, and ability to allow bus
 103.579 +     * mastering activity.  Note that the Cx state policy is completely
 103.580 +     * customizable and can be altered dynamically.
 103.581 +     */
 103.582 +
 103.583 +    /* startup state */
 103.584 +    for ( i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++ )
 103.585 +    {
 103.586 +        cx = &power->states[i];
 103.587 +        if ( !cx->valid )
 103.588 +            continue;
 103.589 +
 103.590 +        if ( !state_is_set )
 103.591 +            power->state = cx;
 103.592 +        state_is_set++;
 103.593 +        break;
 103.594 +    }
 103.595 +
 103.596 +    if ( !state_is_set )
 103.597 +        return -ENODEV;
 103.598 +
 103.599 +    /* demotion */
 103.600 +    for ( i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++ )
 103.601 +    {
 103.602 +        cx = &power->states[i];
 103.603 +        if ( !cx->valid )
 103.604 +            continue;
 103.605 +
 103.606 +        if ( lower )
 103.607 +        {
 103.608 +            cx->demotion.state = lower;
 103.609 +            cx->demotion.threshold.ticks = cx->latency_ticks;
 103.610 +            cx->demotion.threshold.count = 1;
 103.611 +            if ( cx->type == ACPI_STATE_C3 )
 103.612 +                cx->demotion.threshold.bm = bm_history;
 103.613 +        }
 103.614 +
 103.615 +        lower = cx;
 103.616 +    }
 103.617 +
 103.618 +    /* promotion */
 103.619 +    for ( i = (ACPI_PROCESSOR_MAX_POWER - 1); i > 0; i-- )
 103.620 +    {
 103.621 +        cx = &power->states[i];
 103.622 +        if ( !cx->valid )
 103.623 +            continue;
 103.624 +
 103.625 +        if ( higher )
 103.626 +        {
 103.627 +            cx->promotion.state = higher;
 103.628 +            cx->promotion.threshold.ticks = cx->latency_ticks;
 103.629 +            if ( cx->type >= ACPI_STATE_C2 )
 103.630 +                cx->promotion.threshold.count = 4;
 103.631 +            else
 103.632 +                cx->promotion.threshold.count = 10;
 103.633 +            if ( higher->type == ACPI_STATE_C3 )
 103.634 +                cx->promotion.threshold.bm = bm_history;
 103.635 +        }
 103.636 +
 103.637 +        higher = cx;
 103.638 +    }
 103.639 +
 103.640 +    return 0;
 103.641 +}
 103.642 +
 103.643 +static int init_cx_pminfo(struct acpi_processor_power *acpi_power)
 103.644 +{
 103.645 +    memset(acpi_power, 0, sizeof(*acpi_power));
 103.646 +
 103.647 +    acpi_power->states[ACPI_STATE_C1].type = ACPI_STATE_C1;
 103.648 +
 103.649 +    acpi_power->states[ACPI_STATE_C0].valid = 1;
 103.650 +    acpi_power->states[ACPI_STATE_C1].valid = 1;
 103.651 +
 103.652 +    acpi_power->count = 2;
 103.653 +
 103.654 +    return 0;
 103.655 +}
 103.656 +
 103.657 +#define CPUID_MWAIT_LEAF (5)
 103.658 +#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
 103.659 +#define CPUID5_ECX_INTERRUPT_BREAK      (0x2)
 103.660 +
 103.661 +#define MWAIT_ECX_INTERRUPT_BREAK       (0x1)
 103.662 +
 103.663 +#define MWAIT_SUBSTATE_MASK (0xf)
 103.664 +#define MWAIT_SUBSTATE_SIZE (4)
 103.665 +
 103.666 +static int acpi_processor_ffh_cstate_probe(xen_processor_cx_t *cx)
 103.667 +{
 103.668 +    struct cpuinfo_x86 *c = &current_cpu_data;
 103.669 +    unsigned int eax, ebx, ecx, edx;
 103.670 +    unsigned int edx_part;
 103.671 +    unsigned int cstate_type; /* C-state type and not ACPI C-state type */
 103.672 +    unsigned int num_cstate_subtype;
 103.673 +
 103.674 +    if ( c->cpuid_level < CPUID_MWAIT_LEAF )
 103.675 +    {
 103.676 +        printk(XENLOG_INFO "MWAIT leaf not supported by cpuid\n");
 103.677 +        return -EFAULT;
 103.678 +    }
 103.679 +
 103.680 +    cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
 103.681 +    printk(XENLOG_DEBUG "cpuid.MWAIT[.eax=%x, .ebx=%x, .ecx=%x, .edx=%x]\n",
 103.682 +           eax, ebx, ecx, edx);
 103.683 +
 103.684 +    /* Check whether this particular cx_type (in CST) is supported or not */
 103.685 +    cstate_type = (cx->reg.address >> MWAIT_SUBSTATE_SIZE) + 1;
 103.686 +    edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
 103.687 +    num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;
 103.688 +
 103.689 +    if ( num_cstate_subtype < (cx->reg.address & MWAIT_SUBSTATE_MASK) )
 103.690 +        return -EFAULT;
 103.691 +
 103.692 +    /* mwait ecx extensions INTERRUPT_BREAK should be supported for C2/C3 */
 103.693 +    if ( !(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
 103.694 +         !(ecx & CPUID5_ECX_INTERRUPT_BREAK) )
 103.695 +        return -EFAULT;
 103.696 +
 103.697 +    printk(XENLOG_INFO "Monitor-Mwait will be used to enter C-%d state\n", cx->type);
 103.698 +    return 0;
 103.699 +}
 103.700 +
 103.701 +/*
 103.702 + * Initialize bm_flags based on the CPU cache properties
 103.703 + * On SMP it depends on cache configuration
 103.704 + * - When cache is not shared among all CPUs, we flush cache
 103.705 + *   before entering C3.
 103.706 + * - When cache is shared among all CPUs, we use bm_check
 103.707 + *   mechanism as in UP case
 103.708 + *
 103.709 + * This routine is called only after all the CPUs are online
 103.710 + */
 103.711 +static void acpi_processor_power_init_bm_check(struct acpi_processor_flags *flags)
 103.712 +{
 103.713 +    struct cpuinfo_x86 *c = &current_cpu_data;
 103.714 +
 103.715 +    flags->bm_check = 0;
 103.716 +    if ( num_online_cpus() == 1 )
 103.717 +        flags->bm_check = 1;
 103.718 +    else if ( c->x86_vendor == X86_VENDOR_INTEL )
 103.719 +    {
 103.720 +        /*
 103.721 +         * Today all CPUs that support C3 share cache.
 103.722 +         * TBD: This needs to look at cache shared map, once
 103.723 +         * multi-core detection patch makes to the base.
 103.724 +         */
 103.725 +        flags->bm_check = 1;
 103.726 +    }
 103.727 +}
 103.728 +
 103.729 +#define VENDOR_INTEL                   (1)
 103.730 +#define NATIVE_CSTATE_BEYOND_HALT      (2)
 103.731 +
 103.732 +static int check_cx(struct acpi_processor_power *power, xen_processor_cx_t *cx)
 103.733 +{
 103.734 +    static int bm_check_flag;
 103.735 +    if ( cx == NULL )
 103.736 +        return -EINVAL;
 103.737 +
 103.738 +    switch ( cx->reg.space_id )
 103.739 +    {
 103.740 +    case ACPI_ADR_SPACE_SYSTEM_IO:
 103.741 +        if ( cx->reg.address == 0 )
 103.742 +            return -EINVAL;
 103.743 +        break;
 103.744 +
 103.745 +    case ACPI_ADR_SPACE_FIXED_HARDWARE:
 103.746 +        if ( cx->type > ACPI_STATE_C1 )
 103.747 +        {
 103.748 +            if ( cx->reg.bit_width != VENDOR_INTEL || 
 103.749 +                 cx->reg.bit_offset != NATIVE_CSTATE_BEYOND_HALT )
 103.750 +                return -EINVAL;
 103.751 +
 103.752 +            /* assume all logical cpu has the same support for mwait */
 103.753 +            if ( acpi_processor_ffh_cstate_probe(cx) )
 103.754 +                return -EFAULT;
 103.755 +        }
 103.756 +        break;
 103.757 +
 103.758 +    default:
 103.759 +        return -ENODEV;
 103.760 +    }
 103.761 +
 103.762 +    if ( cx->type == ACPI_STATE_C3 )
 103.763 +    {
 103.764 +        /* All the logic here assumes flags.bm_check is same across all CPUs */
 103.765 +        if ( !bm_check_flag )
 103.766 +        {
 103.767 +            /* Determine whether bm_check is needed based on CPU  */
 103.768 +            acpi_processor_power_init_bm_check(&(power->flags));
 103.769 +            bm_check_flag = power->flags.bm_check;
 103.770 +        }
 103.771 +        else
 103.772 +        {
 103.773 +            power->flags.bm_check = bm_check_flag;
 103.774 +        }
 103.775 +
 103.776 +        if ( power->flags.bm_check )
 103.777 +        {
 103.778 +            if ( !power->flags.bm_control )
 103.779 +            {
 103.780 +                if ( power->flags.has_cst != 1 )
 103.781 +                {
 103.782 +                    /* bus mastering control is necessary */
 103.783 +                    ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 103.784 +                        "C3 support requires BM control\n"));
 103.785 +                    return -1;
 103.786 +                }
 103.787 +                else
 103.788 +                {
 103.789 +                    /* Here we enter C3 without bus mastering */
 103.790 +                    ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 103.791 +                        "C3 support without BM control\n"));
 103.792 +                }
 103.793 +            }
 103.794 +        }
 103.795 +        else
 103.796 +        {
 103.797 +            /*
 103.798 +             * WBINVD should be set in fadt, for C3 state to be
 103.799 +             * supported on when bm_check is not required.
 103.800 +             */
 103.801 +            if ( !(acpi_gbl_FADT.flags & ACPI_FADT_WBINVD) )
 103.802 +            {
 103.803 +                ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 103.804 +                          "Cache invalidation should work properly"
 103.805 +                          " for C3 to be enabled on SMP systems\n"));
 103.806 +                return -1;
 103.807 +            }
 103.808 +            acpi_set_register(ACPI_BITREG_BUS_MASTER_RLD, 0);
 103.809 +        }
 103.810 +    }
 103.811 +
 103.812 +    return 0;
 103.813 +}
 103.814 +
 103.815 +static int set_cx(struct acpi_processor_power *acpi_power,
 103.816 +                  xen_processor_cx_t *xen_cx)
 103.817 +{
 103.818 +    struct acpi_processor_cx *cx;
 103.819 +
 103.820 +    /* skip unsupported acpi cstate */
 103.821 +    if ( check_cx(acpi_power, xen_cx) )
 103.822 +        return -EFAULT;
 103.823 +
 103.824 +    cx = &acpi_power->states[xen_cx->type];
 103.825 +    if ( !cx->valid )
 103.826 +        acpi_power->count++;
 103.827 +
 103.828 +    cx->valid    = 1;
 103.829 +    cx->type     = xen_cx->type;
 103.830 +    cx->address  = xen_cx->reg.address;
 103.831 +    cx->space_id = xen_cx->reg.space_id;
 103.832 +    cx->latency  = xen_cx->latency;
 103.833 +    cx->power    = xen_cx->power;
 103.834 +    
 103.835 +    cx->latency_ticks = US_TO_PM_TIMER_TICKS(cx->latency);
 103.836 +
 103.837 +    return 0;   
 103.838 +}
 103.839 +
 103.840 +static int get_cpu_id(u8 acpi_id)
 103.841 +{
 103.842 +    int i;
 103.843 +    u8 apic_id;
 103.844 +
 103.845 +    apic_id = x86_acpiid_to_apicid[acpi_id];
 103.846 +    if ( apic_id == 0xff )
 103.847 +        return -1;
 103.848 +
 103.849 +    for ( i = 0; i < NR_CPUS; i++ )
 103.850 +    {
 103.851 +        if ( apic_id == x86_cpu_to_apicid[i] )
 103.852 +            return i;
 103.853 +    }
 103.854 +
 103.855 +    return -1;
 103.856 +}
 103.857 +
 103.858 +#ifdef DEBUG_PM_CX
 103.859 +static void print_cx_pminfo(uint32_t cpu, struct xen_processor_power *power)
 103.860 +{
 103.861 +    XEN_GUEST_HANDLE(xen_processor_cx_t) states;
 103.862 +    xen_processor_cx_t  state;
 103.863 +    XEN_GUEST_HANDLE(xen_processor_csd_t) csd;
 103.864 +    xen_processor_csd_t dp;
 103.865 +    uint32_t i;
 103.866 +
 103.867 +    printk("cpu%d cx acpi info:\n", cpu);
 103.868 +    printk("\tcount = %d\n", power->count);
 103.869 +    printk("\tflags: bm_cntl[%d], bm_chk[%d], has_cst[%d],\n"
 103.870 +           "\t       pwr_setup_done[%d], bm_rld_set[%d]\n",
 103.871 +           power->flags.bm_control, power->flags.bm_check, power->flags.has_cst,
 103.872 +           power->flags.power_setup_done, power->flags.bm_rld_set);
 103.873 +    
 103.874 +    states = power->states;
 103.875 +    
 103.876 +    for ( i = 0; i < power->count; i++ )
 103.877 +    {
 103.878 +        if ( unlikely(copy_from_guest_offset(&state, states, i, 1)) )
 103.879 +            return;
 103.880 +        
 103.881 +        printk("\tstates[%d]:\n", i);
 103.882 +        printk("\t\treg.space_id = 0x%x\n", state.reg.space_id);
 103.883 +        printk("\t\treg.bit_width = 0x%x\n", state.reg.bit_width);
 103.884 +        printk("\t\treg.bit_offset = 0x%x\n", state.reg.bit_offset);
 103.885 +        printk("\t\treg.access_size = 0x%x\n", state.reg.access_size);
 103.886 +        printk("\t\treg.address = 0x%"PRIx64"\n", state.reg.address);
 103.887 +        printk("\t\ttype    = %d\n", state.type);
 103.888 +        printk("\t\tlatency = %d\n", state.latency);
 103.889 +        printk("\t\tpower   = %d\n", state.power);
 103.890 +
 103.891 +        csd = state.dp;
 103.892 +        printk("\t\tdp(@0x%p)\n", csd.p);
 103.893 +        
 103.894 +        if ( csd.p != NULL )
 103.895 +        {
 103.896 +            if ( unlikely(copy_from_guest(&dp, csd, 1)) )
 103.897 +                return;
 103.898 +            printk("\t\t\tdomain = %d\n", dp.domain);
 103.899 +            printk("\t\t\tcoord_type   = %d\n", dp.coord_type);
 103.900 +            printk("\t\t\tnum = %d\n", dp.num);
 103.901 +        }
 103.902 +    }
 103.903 +}
 103.904 +#else
 103.905 +#define print_cx_pminfo(c, p)
 103.906 +#endif
 103.907 +
 103.908 +long set_cx_pminfo(uint32_t cpu, struct xen_processor_power *power)
 103.909 +{
 103.910 +    XEN_GUEST_HANDLE(xen_processor_cx_t) states;
 103.911 +    xen_processor_cx_t xen_cx;
 103.912 +    struct acpi_processor_power *acpi_power;
 103.913 +    int cpu_id, i;
 103.914 +
 103.915 +    if ( unlikely(!guest_handle_okay(power->states, power->count)) )
 103.916 +        return -EFAULT;
 103.917 +
 103.918 +    print_cx_pminfo(cpu, power);
 103.919 +
 103.920 +    /* map from acpi_id to cpu_id */
 103.921 +    cpu_id = get_cpu_id((u8)cpu);
 103.922 +    if ( cpu_id == -1 )
 103.923 +    {
 103.924 +        printk(XENLOG_ERR "no cpu_id for acpi_id %d\n", cpu);
 103.925 +        return -EFAULT;
 103.926 +    }
 103.927 +
 103.928 +    acpi_power = &processor_powers[cpu_id];
 103.929 +
 103.930 +    init_cx_pminfo(acpi_power);
 103.931 +
 103.932 +    acpi_power->flags.bm_check = power->flags.bm_check;
 103.933 +    acpi_power->flags.bm_control = power->flags.bm_control;
 103.934 +    acpi_power->flags.has_cst = power->flags.has_cst;
 103.935 +
 103.936 +    states = power->states;
 103.937 +
 103.938 +    for ( i = 0; i < power->count; i++ )
 103.939 +    {
 103.940 +        if ( unlikely(copy_from_guest_offset(&xen_cx, states, i, 1)) )
 103.941 +            return -EFAULT;
 103.942 +
 103.943 +        set_cx(acpi_power, &xen_cx);
 103.944 +    }
 103.945 +
 103.946 +    /* FIXME: C-state dependency is not supported by far */
 103.947 +    
 103.948 +    /* initialize default policy */
 103.949 +    acpi_processor_set_power_policy(acpi_power);
 103.950 +
 103.951 +    print_acpi_power(cpu_id, acpi_power);
 103.952 +
 103.953 +    if ( cpu_id == 0 && pm_idle_save == NULL )
 103.954 +    {
 103.955 +        pm_idle_save = pm_idle;
 103.956 +        pm_idle = acpi_processor_idle;
 103.957 +    }
 103.958 +        
 103.959 +    return 0;
 103.960 +}
   104.1 --- a/xen/arch/x86/apic.c	Fri Apr 25 20:13:52 2008 +0900
   104.2 +++ b/xen/arch/x86/apic.c	Thu May 08 18:40:07 2008 +0900
   104.3 @@ -47,6 +47,8 @@ int enable_local_apic __initdata = 0; /*
   104.4   */
   104.5  int apic_verbosity;
   104.6  
   104.7 +int x2apic_enabled __read_mostly = 0;
   104.8 +
   104.9  
  104.10  static void apic_pm_activate(void);
  104.11  
  104.12 @@ -306,7 +308,10 @@ int __init verify_local_APIC(void)
  104.13       */
  104.14      reg0 = apic_read(APIC_LVR);
  104.15      apic_printk(APIC_DEBUG, "Getting VERSION: %x\n", reg0);
  104.16 -    apic_write(APIC_LVR, reg0 ^ APIC_LVR_MASK);
  104.17 +
  104.18 +    /* We don't try writing LVR in x2APIC mode since that incurs #GP. */
  104.19 +    if ( !x2apic_enabled )
  104.20 +        apic_write(APIC_LVR, reg0 ^ APIC_LVR_MASK);
  104.21      reg1 = apic_read(APIC_LVR);
  104.22      apic_printk(APIC_DEBUG, "Getting VERSION: %x\n", reg1);
  104.23  
  104.24 @@ -610,7 +615,8 @@ int lapic_suspend(void)
  104.25      apic_pm_state.apic_id = apic_read(APIC_ID);
  104.26      apic_pm_state.apic_taskpri = apic_read(APIC_TASKPRI);
  104.27      apic_pm_state.apic_ldr = apic_read(APIC_LDR);
  104.28 -    apic_pm_state.apic_dfr = apic_read(APIC_DFR);
  104.29 +    if ( !x2apic_enabled )
  104.30 +        apic_pm_state.apic_dfr = apic_read(APIC_DFR);
  104.31      apic_pm_state.apic_spiv = apic_read(APIC_SPIV);
  104.32      apic_pm_state.apic_lvtt = apic_read(APIC_LVTT);
  104.33      apic_pm_state.apic_lvtpc = apic_read(APIC_LVTPC);
  104.34 @@ -643,14 +649,20 @@ int lapic_resume(void)
  104.35       * FIXME! This will be wrong if we ever support suspend on
  104.36       * SMP! We'll need to do this as part of the CPU restore!
  104.37       */
  104.38 -    rdmsr(MSR_IA32_APICBASE, l, h);
  104.39 -    l &= ~MSR_IA32_APICBASE_BASE;
  104.40 -    l |= MSR_IA32_APICBASE_ENABLE | mp_lapic_addr;
  104.41 -    wrmsr(MSR_IA32_APICBASE, l, h);
  104.42 +    if ( !x2apic_enabled )
  104.43 +    {
  104.44 +        rdmsr(MSR_IA32_APICBASE, l, h);
  104.45 +        l &= ~MSR_IA32_APICBASE_BASE;
  104.46 +        l |= MSR_IA32_APICBASE_ENABLE | mp_lapic_addr;
  104.47 +        wrmsr(MSR_IA32_APICBASE, l, h);
  104.48 +    }
  104.49 +    else
  104.50 +        enable_x2apic();
  104.51  
  104.52      apic_write(APIC_LVTERR, ERROR_APIC_VECTOR | APIC_LVT_MASKED);
  104.53      apic_write(APIC_ID, apic_pm_state.apic_id);
  104.54 -    apic_write(APIC_DFR, apic_pm_state.apic_dfr);
  104.55 +    if ( !x2apic_enabled )
  104.56 +        apic_write(APIC_DFR, apic_pm_state.apic_dfr);
  104.57      apic_write(APIC_LDR, apic_pm_state.apic_ldr);
  104.58      apic_write(APIC_TASKPRI, apic_pm_state.apic_taskpri);
  104.59      apic_write(APIC_SPIV, apic_pm_state.apic_spiv);
  104.60 @@ -809,10 +821,29 @@ no_apic:
  104.61      return -1;
  104.62  }
  104.63  
  104.64 +void enable_x2apic(void)
  104.65 +{
  104.66 +    u32 lo, hi;
  104.67 +
  104.68 +    rdmsr(MSR_IA32_APICBASE, lo, hi);
  104.69 +    if ( !(lo & MSR_IA32_APICBASE_EXTD) )
  104.70 +    {
  104.71 +        lo |= MSR_IA32_APICBASE_ENABLE | MSR_IA32_APICBASE_EXTD;
  104.72 +        wrmsr(MSR_IA32_APICBASE, lo, 0);
  104.73 +        printk("x2APIC mode enabled.\n");
  104.74 +    }
  104.75 +    else
  104.76 +        printk("x2APIC mode enabled by BIOS.\n");
  104.77 +
  104.78 +    x2apic_enabled = 1;
  104.79 +}
  104.80 +
  104.81  void __init init_apic_mappings(void)
  104.82  {
  104.83      unsigned long apic_phys;
  104.84  
  104.85 +    if ( x2apic_enabled )
  104.86 +        goto __next;
  104.87      /*
  104.88       * If no local APIC can be found then set up a fake all
  104.89       * zeroes page to simulate the local APIC and another
  104.90 @@ -828,12 +859,13 @@ void __init init_apic_mappings(void)
  104.91      apic_printk(APIC_VERBOSE, "mapped APIC to %08lx (%08lx)\n", APIC_BASE,
  104.92                  apic_phys);
  104.93  
  104.94 +__next:
  104.95      /*
  104.96       * Fetch the APIC ID of the BSP in case we have a
  104.97       * default configuration (or the MP table is broken).
  104.98       */
  104.99      if (boot_cpu_physical_apicid == -1U)
 104.100 -        boot_cpu_physical_apicid = GET_APIC_ID(apic_read(APIC_ID));
 104.101 +        boot_cpu_physical_apicid = get_apic_id();
 104.102  
 104.103  #ifdef CONFIG_X86_IO_APIC
 104.104      {
 104.105 @@ -1271,7 +1303,7 @@ int __init APIC_init_uniprocessor (void)
 104.106       * might be zero if read from MP tables. Get it from LAPIC.
 104.107       */
 104.108  #ifdef CONFIG_CRASH_DUMP
 104.109 -    boot_cpu_physical_apicid = GET_APIC_ID(apic_read(APIC_ID));
 104.110 +    boot_cpu_physical_apicid = get_apic_id();
 104.111  #endif
 104.112      phys_cpu_present_map = physid_mask_of_physid(boot_cpu_physical_apicid);
 104.113  
   105.1 --- a/xen/arch/x86/domain.c	Fri Apr 25 20:13:52 2008 +0900
   105.2 +++ b/xen/arch/x86/domain.c	Thu May 08 18:40:07 2008 +0900
   105.3 @@ -56,6 +56,9 @@ DEFINE_PER_CPU(struct vcpu *, curr_vcpu)
   105.4  DEFINE_PER_CPU(u64, efer);
   105.5  DEFINE_PER_CPU(unsigned long, cr4);
   105.6  
   105.7 +static void default_idle(void);
   105.8 +void (*pm_idle) (void) = default_idle;
   105.9 +
  105.10  static void unmap_vcpu_info(struct vcpu *v);
  105.11  
  105.12  static void paravirt_ctxt_switch_from(struct vcpu *v);
  105.13 @@ -105,7 +108,7 @@ void idle_loop(void)
  105.14          if ( cpu_is_offline(smp_processor_id()) )
  105.15              play_dead();
  105.16          page_scrub_schedule_work();
  105.17 -        default_idle();
  105.18 +        (*pm_idle)();
  105.19          do_softirq();
  105.20      }
  105.21  }
  105.22 @@ -440,10 +443,9 @@ int arch_domain_create(struct domain *d,
  105.23  {
  105.24  #ifdef __x86_64__
  105.25      struct page_info *pg;
  105.26 -    int i;
  105.27  #endif
  105.28      l1_pgentry_t gdt_l1e;
  105.29 -    int vcpuid, pdpt_order, paging_initialised = 0;
  105.30 +    int i, vcpuid, pdpt_order, paging_initialised = 0;
  105.31      int rc = -ENOMEM;
  105.32  
  105.33      d->arch.hvm_domain.hap_enabled =
  105.34 @@ -526,6 +528,8 @@ int arch_domain_create(struct domain *d,
  105.35              goto fail;
  105.36      }
  105.37  
  105.38 +    spin_lock_init(&d->arch.irq_lock);
  105.39 +
  105.40      if ( is_hvm_domain(d) )
  105.41      {
  105.42          if ( (rc = hvm_domain_initialise(d)) != 0 )
  105.43 @@ -541,6 +545,13 @@ int arch_domain_create(struct domain *d,
  105.44              (CONFIG_PAGING_LEVELS != 4);
  105.45      }
  105.46  
  105.47 +    memset(d->arch.cpuids, 0, sizeof(d->arch.cpuids));
  105.48 +    for ( i = 0; i < MAX_CPUID_INPUT; i++ )
  105.49 +    {
  105.50 +        d->arch.cpuids[i].input[0] = XEN_CPUID_INPUT_UNUSED;
  105.51 +        d->arch.cpuids[i].input[1] = XEN_CPUID_INPUT_UNUSED;
  105.52 +    }
  105.53 +
  105.54      return 0;
  105.55  
  105.56   fail:
  105.57 @@ -1910,6 +1921,37 @@ void arch_dump_vcpu_info(struct vcpu *v)
  105.58      paging_dump_vcpu_info(v);
  105.59  }
  105.60  
  105.61 +void domain_cpuid(
  105.62 +    struct domain *d,
  105.63 +    unsigned int  input,
  105.64 +    unsigned int  sub_input,
  105.65 +    unsigned int  *eax,
  105.66 +    unsigned int  *ebx,
  105.67 +    unsigned int  *ecx,
  105.68 +    unsigned int  *edx)
  105.69 +{
  105.70 +    cpuid_input_t *cpuid;
  105.71 +    int i;
  105.72 +
  105.73 +    for ( i = 0; i < MAX_CPUID_INPUT; i++ )
  105.74 +    {
  105.75 +        cpuid = &d->arch.cpuids[i];
  105.76 +
  105.77 +        if ( (cpuid->input[0] == input) &&
  105.78 +             ((cpuid->input[1] == XEN_CPUID_INPUT_UNUSED) ||
  105.79 +              (cpuid->input[1] == sub_input)) )
  105.80 +        {
  105.81 +            *eax = cpuid->eax;
  105.82 +            *ebx = cpuid->ebx;
  105.83 +            *ecx = cpuid->ecx;
  105.84 +            *edx = cpuid->edx;
  105.85 +            return;
  105.86 +        }
  105.87 +    }
  105.88 +
  105.89 +    *eax = *ebx = *ecx = *edx = 0;
  105.90 +}
  105.91 +
  105.92  /*
  105.93   * Local variables:
  105.94   * mode: C
   106.1 --- a/xen/arch/x86/domctl.c	Fri Apr 25 20:13:52 2008 +0900
   106.2 +++ b/xen/arch/x86/domctl.c	Thu May 08 18:40:07 2008 +0900
   106.3 @@ -10,6 +10,7 @@
   106.4  #include <xen/mm.h>
   106.5  #include <xen/guest_access.h>
   106.6  #include <xen/compat.h>
   106.7 +#include <xen/pci.h>
   106.8  #include <public/domctl.h>
   106.9  #include <xen/sched.h>
  106.10  #include <xen/domain.h>
  106.11 @@ -539,7 +540,7 @@ long arch_do_domctl(
  106.12          if ( device_assigned(bus, devfn) )
  106.13          {
  106.14              gdprintk(XENLOG_ERR, "XEN_DOMCTL_test_assign_device: "
  106.15 -                     "%x:%x:%x already assigned\n",
  106.16 +                     "%x:%x:%x already assigned, or non-existent\n",
  106.17                       bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
  106.18              break;
  106.19          }
  106.20 @@ -568,7 +569,7 @@ long arch_do_domctl(
  106.21          if ( device_assigned(bus, devfn) )
  106.22          {
  106.23              gdprintk(XENLOG_ERR, "XEN_DOMCTL_assign_device: "
  106.24 -                     "%x:%x:%x already assigned\n",
  106.25 +                     "%x:%x:%x already assigned, or non-existent\n",
  106.26                       bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
  106.27              break;
  106.28          }
  106.29 @@ -842,6 +843,45 @@ long arch_do_domctl(
  106.30      }
  106.31      break;
  106.32  
  106.33 +    case XEN_DOMCTL_set_cpuid:
  106.34 +    {
  106.35 +        struct domain *d;
  106.36 +        xen_domctl_cpuid_t *ctl = &domctl->u.cpuid;
  106.37 +        cpuid_input_t *cpuid = NULL; 
  106.38 +        int i;
  106.39 +
  106.40 +        ret = -ESRCH;
  106.41 +        d = rcu_lock_domain_by_id(domctl->domain);
  106.42 +        if ( d == NULL )
  106.43 +            break;
  106.44 +
  106.45 +        for ( i = 0; i < MAX_CPUID_INPUT; i++ )
  106.46 +        {
  106.47 +            cpuid = &d->arch.cpuids[i];
  106.48 +
  106.49 +            if ( cpuid->input[0] == XEN_CPUID_INPUT_UNUSED )
  106.50 +                break;
  106.51 +
  106.52 +            if ( (cpuid->input[0] == ctl->input[0]) &&
  106.53 +                 ((cpuid->input[1] == XEN_CPUID_INPUT_UNUSED) ||
  106.54 +                  (cpuid->input[1] == ctl->input[1])) )
  106.55 +                break;
  106.56 +        }
  106.57 +        
  106.58 +        if ( i == MAX_CPUID_INPUT )
  106.59 +        {
  106.60 +            ret = -ENOENT;
  106.61 +        }
  106.62 +        else
  106.63 +        {
  106.64 +            memcpy(cpuid, ctl, sizeof(cpuid_input_t));
  106.65 +            ret = 0;
  106.66 +        }
  106.67 +
  106.68 +        rcu_unlock_domain(d);
  106.69 +    }
  106.70 +    break;
  106.71 +
  106.72      default:
  106.73          ret = -ENOSYS;
  106.74          break;
   107.1 --- a/xen/arch/x86/genapic/Makefile	Fri Apr 25 20:13:52 2008 +0900
   107.2 +++ b/xen/arch/x86/genapic/Makefile	Thu May 08 18:40:07 2008 +0900
   107.3 @@ -1,4 +1,5 @@
   107.4  obj-y += bigsmp.o
   107.5 +obj-y += x2apic.o
   107.6  obj-y += default.o
   107.7  obj-y += delivery.o
   107.8  obj-y += probe.o
   108.1 --- a/xen/arch/x86/genapic/delivery.c	Fri Apr 25 20:13:52 2008 +0900
   108.2 +++ b/xen/arch/x86/genapic/delivery.c	Thu May 08 18:40:07 2008 +0900
   108.3 @@ -17,7 +17,7 @@ void init_apic_ldr_flat(void)
   108.4  
   108.5  	apic_write_around(APIC_DFR, APIC_DFR_FLAT);
   108.6  	val = apic_read(APIC_LDR) & ~APIC_LDR_MASK;
   108.7 -	val |= SET_APIC_LOGICAL_ID(1UL << smp_processor_id());
   108.8 +	val |= SET_xAPIC_LOGICAL_ID(1UL << smp_processor_id());
   108.9  	apic_write_around(APIC_LDR, val);
  108.10  }
  108.11  
   109.1 --- a/xen/arch/x86/genapic/probe.c	Fri Apr 25 20:13:52 2008 +0900
   109.2 +++ b/xen/arch/x86/genapic/probe.c	Thu May 08 18:40:07 2008 +0900
   109.3 @@ -14,6 +14,7 @@
   109.4  #include <asm/apicdef.h>
   109.5  #include <asm/genapic.h>
   109.6  
   109.7 +extern struct genapic apic_x2apic;
   109.8  extern struct genapic apic_summit;
   109.9  extern struct genapic apic_bigsmp;
  109.10  extern struct genapic apic_default;
  109.11 @@ -21,6 +22,7 @@ extern struct genapic apic_default;
  109.12  struct genapic *genapic;
  109.13  
  109.14  struct genapic *apic_probe[] __initdata = { 
  109.15 +	&apic_x2apic, 
  109.16  	&apic_summit,
  109.17  	&apic_bigsmp, 
  109.18  	&apic_default,	/* must be last */
   110.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
   110.2 +++ b/xen/arch/x86/genapic/x2apic.c	Thu May 08 18:40:07 2008 +0900
   110.3 @@ -0,0 +1,79 @@
   110.4 +/*
   110.5 + * x2APIC driver.
   110.6 + *
   110.7 + * Copyright (c) 2008, Intel Corporation.
   110.8 + *
   110.9 + * This program is free software; you can redistribute it and/or modify it
  110.10 + * under the terms and conditions of the GNU General Public License,
  110.11 + * version 2, as published by the Free Software Foundation.
  110.12 + *
  110.13 + * This program is distributed in the hope it will be useful, but WITHOUT
  110.14 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  110.15 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
  110.16 + * more details.
  110.17 + *
  110.18 + * You should have received a copy of the GNU General Public License along with
  110.19 + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
  110.20 + * Place - Suite 330, Boston, MA 02111-1307 USA.
  110.21 + */
  110.22 +
  110.23 +#include <xen/cpumask.h>
  110.24 +#include <asm/apicdef.h>
  110.25 +#include <asm/genapic.h>
  110.26 +#include <xen/smp.h>
  110.27 +#include <asm/mach-default/mach_mpparse.h>
  110.28 +
  110.29 +__init int probe_x2apic(void)
  110.30 +{
  110.31 +    return x2apic_is_available();
  110.32 +}
  110.33 +
  110.34 +struct genapic apic_x2apic= {
  110.35 +    APIC_INIT("x2apic", probe_x2apic),
  110.36 +    GENAPIC_X2APIC
  110.37 +};
  110.38 +
  110.39 +void init_apic_ldr_x2apic(void)
  110.40 +{
  110.41 +    /* We only use physical delivery mode. */
  110.42 +    return;
  110.43 +}
  110.44 +
  110.45 +void clustered_apic_check_x2apic(void)
  110.46 +{
  110.47 +    /* We only use physical delivery mode. */
  110.48 +    return;
  110.49 +}
  110.50 +
  110.51 +cpumask_t target_cpus_x2apic(void)
  110.52 +{
  110.53 +    /* Deliver interrupts only to CPU0 for now */
  110.54 +    return cpumask_of_cpu(0);
  110.55 +}
  110.56 +
  110.57 +unsigned int cpu_mask_to_apicid_x2apic(cpumask_t cpumask)
  110.58 +{
  110.59 +    return cpu_physical_id(first_cpu(cpumask));
  110.60 +}
  110.61 +
  110.62 +void send_IPI_mask_x2apic(cpumask_t cpumask, int vector)
  110.63 +{
  110.64 +    unsigned int query_cpu;
  110.65 +    u32 cfg, dest;
  110.66 +    unsigned long flags;
  110.67 +
  110.68 +    ASSERT(cpus_subset(cpumask, cpu_online_map));
  110.69 +    ASSERT(!cpus_empty(cpumask));
  110.70 +
  110.71 +    local_irq_save(flags);
  110.72 +
  110.73 +    cfg = APIC_DM_FIXED | 0 /* no shorthand */ | APIC_DEST_PHYSICAL | vector;
  110.74 +    for_each_cpu_mask(query_cpu, cpumask)
  110.75 +    {
  110.76 +        dest =  cpu_physical_id(query_cpu);
  110.77 +        apic_icr_write(cfg, dest);
  110.78 +    }
  110.79 +
  110.80 +    local_irq_restore(flags);
  110.81 +}
  110.82 +
   111.1 --- a/xen/arch/x86/hvm/Makefile	Fri Apr 25 20:13:52 2008 +0900
   111.2 +++ b/xen/arch/x86/hvm/Makefile	Thu May 08 18:40:07 2008 +0900
   111.3 @@ -16,4 +16,5 @@ obj-y += vioapic.o
   111.4  obj-y += vlapic.o
   111.5  obj-y += vpic.o
   111.6  obj-y += save.o
   111.7 +obj-y += vmsi.o
   111.8  obj-y += stdvga.o
   112.1 --- a/xen/arch/x86/hvm/hvm.c	Fri Apr 25 20:13:52 2008 +0900
   112.2 +++ b/xen/arch/x86/hvm/hvm.c	Thu May 08 18:40:07 2008 +0900
   112.3 @@ -46,6 +46,7 @@
   112.4  #include <asm/hvm/vpt.h>
   112.5  #include <asm/hvm/support.h>
   112.6  #include <asm/hvm/cacheattr.h>
   112.7 +#include <asm/hvm/trace.h>
   112.8  #include <public/sched.h>
   112.9  #include <public/hvm/ioreq.h>
  112.10  #include <public/version.h>
  112.11 @@ -739,15 +740,22 @@ void hvm_send_assist_req(struct vcpu *v)
  112.12  
  112.13  void hvm_hlt(unsigned long rflags)
  112.14  {
  112.15 +    struct vcpu *curr = current;
  112.16 +
  112.17 +    if ( hvm_event_pending(curr) )
  112.18 +        return;
  112.19 +
  112.20      /*
  112.21       * If we halt with interrupts disabled, that's a pretty sure sign that we
  112.22       * want to shut down. In a real processor, NMIs are the only way to break
  112.23       * out of this.
  112.24       */
  112.25      if ( unlikely(!(rflags & X86_EFLAGS_IF)) )
  112.26 -        return hvm_vcpu_down(current);
  112.27 +        return hvm_vcpu_down(curr);
  112.28  
  112.29      do_sched_op_compat(SCHEDOP_block, 0);
  112.30 +
  112.31 +    HVMTRACE_1D(HLT, curr, /* pending = */ vcpu_runnable(curr));
  112.32  }
  112.33  
  112.34  void hvm_triple_fault(void)
  112.35 @@ -1594,66 +1602,15 @@ void hvm_cpuid(unsigned int input, unsig
  112.36      if ( cpuid_hypervisor_leaves(input, eax, ebx, ecx, edx) )
  112.37          return;
  112.38  
  112.39 -    cpuid(input, eax, ebx, ecx, edx);
  112.40 -
  112.41 -    switch ( input )
  112.42 -    {
  112.43 -    case 0x00000001:
  112.44 -        /* Clear #threads count and poke initial VLAPIC ID. */
  112.45 -        *ebx &= 0x0000FFFFu;
  112.46 -        *ebx |= (current->vcpu_id * 2) << 24;
  112.47 -
  112.48 -        /* We always support MTRR MSRs. */
  112.49 -        *edx |= bitmaskof(X86_FEATURE_MTRR);
  112.50 -
  112.51 -        *ecx &= (bitmaskof(X86_FEATURE_XMM3) |
  112.52 -                 bitmaskof(X86_FEATURE_SSSE3) |
  112.53 -                 bitmaskof(X86_FEATURE_CX16) |
  112.54 -                 bitmaskof(X86_FEATURE_SSE4_1) |
  112.55 -                 bitmaskof(X86_FEATURE_SSE4_2) |
  112.56 -                 bitmaskof(X86_FEATURE_POPCNT));
  112.57 +    domain_cpuid(v->domain, input, *ecx, eax, ebx, ecx, edx);
  112.58  
  112.59 -        *edx &= (bitmaskof(X86_FEATURE_FPU) |
  112.60 -                 bitmaskof(X86_FEATURE_VME) |
  112.61 -                 bitmaskof(X86_FEATURE_DE) |
  112.62 -                 bitmaskof(X86_FEATURE_PSE) |
  112.63 -                 bitmaskof(X86_FEATURE_TSC) |
  112.64 -                 bitmaskof(X86_FEATURE_MSR) |
  112.65 -                 bitmaskof(X86_FEATURE_PAE) |
  112.66 -                 bitmaskof(X86_FEATURE_MCE) |
  112.67 -                 bitmaskof(X86_FEATURE_CX8) |
  112.68 -                 bitmaskof(X86_FEATURE_APIC) |
  112.69 -                 bitmaskof(X86_FEATURE_SEP) |
  112.70 -                 bitmaskof(X86_FEATURE_MTRR) |
  112.71 -                 bitmaskof(X86_FEATURE_PGE) |
  112.72 -                 bitmaskof(X86_FEATURE_MCA) |
  112.73 -                 bitmaskof(X86_FEATURE_CMOV) |
  112.74 -                 bitmaskof(X86_FEATURE_PAT) |
  112.75 -                 bitmaskof(X86_FEATURE_CLFLSH) |
  112.76 -                 bitmaskof(X86_FEATURE_MMX) |
  112.77 -                 bitmaskof(X86_FEATURE_FXSR) |
  112.78 -                 bitmaskof(X86_FEATURE_XMM) |
  112.79 -                 bitmaskof(X86_FEATURE_XMM2));
  112.80 +    if ( input == 0x00000001 )
  112.81 +    {
  112.82 +        /* Fix up VLAPIC details. */
  112.83 +        *ebx &= 0x00FFFFFFu;
  112.84 +        *ebx |= (v->vcpu_id * 2) << 24;
  112.85          if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
  112.86 -            __clear_bit(X86_FEATURE_APIC & 31, edx);
  112.87 -#if CONFIG_PAGING_LEVELS >= 3
  112.88 -        if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_PAE_ENABLED] )
  112.89 -#endif
  112.90 -            __clear_bit(X86_FEATURE_PAE & 31, edx);
  112.91 -        break;
  112.92 -
  112.93 -    case 0x80000001:
  112.94 -#if CONFIG_PAGING_LEVELS >= 3
  112.95 -        if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_PAE_ENABLED] )
  112.96 -#endif
  112.97 -            __clear_bit(X86_FEATURE_NX & 31, edx);
  112.98 -#ifdef __i386__
  112.99 -        /* Mask feature for Intel ia32e or AMD long mode. */
 112.100 -        __clear_bit(X86_FEATURE_LAHF_LM & 31, ecx);
 112.101 -        __clear_bit(X86_FEATURE_LM & 31, edx);
 112.102 -        __clear_bit(X86_FEATURE_SYSCALL & 31, edx);
 112.103 -#endif
 112.104 -        break;
 112.105 +            __clear_bit(X86_FEATURE_APIC & 31, ebx);
 112.106      }
 112.107  }
 112.108  
 112.109 @@ -1663,11 +1620,15 @@ int hvm_msr_read_intercept(struct cpu_us
 112.110      uint64_t msr_content = 0;
 112.111      struct vcpu *v = current;
 112.112      uint64_t *var_range_base, *fixed_range_base;
 112.113 -    int index;
 112.114 +    int index, mtrr;
 112.115 +    uint32_t cpuid[4];
 112.116  
 112.117      var_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.var_ranges;
 112.118      fixed_range_base = (uint64_t *)v->arch.hvm_vcpu.mtrr.fixed_ranges;
 112.119  
 112.120 +    hvm_cpuid(1, &cpuid[0], &cpuid[1], &cpuid[2], &cpuid[3]);
 112.121 +    mtrr = !!(cpuid[3] & bitmaskof(X86_FEATURE_MTRR));
 112.122 +
 112.123      switch ( ecx )
 112.124      {
 112.125      case MSR_IA32_TSC:
 112.126 @@ -1695,25 +1656,37 @@ int hvm_msr_read_intercept(struct cpu_us
 112.127          break;
 112.128  
 112.129      case MSR_MTRRcap:
 112.130 +        if ( !mtrr )
 112.131 +            goto gp_fault;
 112.132          msr_content = v->arch.hvm_vcpu.mtrr.mtrr_cap;
 112.133          break;
 112.134      case MSR_MTRRdefType:
 112.135 +        if ( !mtrr )
 112.136 +            goto gp_fault;
 112.137          msr_content = v->arch.hvm_vcpu.mtrr.def_type
 112.138                          | (v->arch.hvm_vcpu.mtrr.enabled << 10);
 112.139          break;
 112.140      case MSR_MTRRfix64K_00000:
 112.141 +        if ( !mtrr )
 112.142 +            goto gp_fault;
 112.143          msr_content = fixed_range_base[0];
 112.144          break;
 112.145      case MSR_MTRRfix16K_80000:
 112.146      case MSR_MTRRfix16K_A0000:
 112.147 +        if ( !mtrr )
 112.148 +            goto gp_fault;
 112.149          index = regs->ecx - MSR_MTRRfix16K_80000;
 112.150          msr_content = fixed_range_base[index + 1];
 112.151          break;
 112.152      case MSR_MTRRfix4K_C0000...MSR_MTRRfix4K_F8000:
 112.153 +        if ( !mtrr )
 112.154 +            goto gp_fault;
 112.155          index = regs->ecx - MSR_MTRRfix4K_C0000;
 112.156          msr_content = fixed_range_base[index + 3];
 112.157          break;
 112.158      case MSR_IA32_MTRR_PHYSBASE0...MSR_IA32_MTRR_PHYSMASK7:
 112.159 +        if ( !mtrr )
 112.160 +            goto gp_fault;
 112.161          index = regs->ecx - MSR_IA32_MTRR_PHYSBASE0;
 112.162          msr_content = var_range_base[index];
 112.163          break;
 112.164 @@ -1725,6 +1698,10 @@ int hvm_msr_read_intercept(struct cpu_us
 112.165      regs->eax = (uint32_t)msr_content;
 112.166      regs->edx = (uint32_t)(msr_content >> 32);
 112.167      return X86EMUL_OKAY;
 112.168 +
 112.169 +gp_fault:
 112.170 +    hvm_inject_exception(TRAP_gp_fault, 0, 0);
 112.171 +    return X86EMUL_EXCEPTION;
 112.172  }
 112.173  
 112.174  int hvm_msr_write_intercept(struct cpu_user_regs *regs)
 112.175 @@ -1739,7 +1716,11 @@ int hvm_msr_write_intercept(struct cpu_u
 112.176      uint32_t ecx = regs->ecx;
 112.177      uint64_t msr_content = (uint32_t)regs->eax | ((uint64_t)regs->edx << 32);
 112.178      struct vcpu *v = current;
 112.179 -    int index;
 112.180 +    int index, mtrr;
 112.181 +    uint32_t cpuid[4];
 112.182 +
 112.183 +    hvm_cpuid(1, &cpuid[0], &cpuid[1], &cpuid[2], &cpuid[3]);
 112.184 +    mtrr = !!(cpuid[3] & bitmaskof(X86_FEATURE_MTRR));
 112.185  
 112.186      switch ( ecx )
 112.187      {
 112.188 @@ -1758,29 +1739,41 @@ int hvm_msr_write_intercept(struct cpu_u
 112.189          break;
 112.190  
 112.191      case MSR_MTRRcap:
 112.192 +        if ( !mtrr )
 112.193 +            goto gp_fault;
 112.194          goto gp_fault;
 112.195      case MSR_MTRRdefType:
 112.196 +        if ( !mtrr )
 112.197 +            goto gp_fault;
 112.198          if ( !mtrr_def_type_msr_set(&v->arch.hvm_vcpu.mtrr, msr_content) )
 112.199             goto gp_fault;
 112.200          break;
 112.201      case MSR_MTRRfix64K_00000:
 112.202 +        if ( !mtrr )
 112.203 +            goto gp_fault;
 112.204          if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr, 0, msr_content) )
 112.205              goto gp_fault;
 112.206          break;
 112.207      case MSR_MTRRfix16K_80000:
 112.208      case MSR_MTRRfix16K_A0000:
 112.209 +        if ( !mtrr )
 112.210 +            goto gp_fault;
 112.211          index = regs->ecx - MSR_MTRRfix16K_80000 + 1;
 112.212          if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr,
 112.213                                       index, msr_content) )
 112.214              goto gp_fault;
 112.215          break;
 112.216      case MSR_MTRRfix4K_C0000...MSR_MTRRfix4K_F8000:
 112.217 +        if ( !mtrr )
 112.218 +            goto gp_fault;
 112.219          index = regs->ecx - MSR_MTRRfix4K_C0000 + 3;
 112.220          if ( !mtrr_fix_range_msr_set(&v->arch.hvm_vcpu.mtrr,
 112.221                                       index, msr_content) )
 112.222              goto gp_fault;
 112.223          break;
 112.224      case MSR_IA32_MTRR_PHYSBASE0...MSR_IA32_MTRR_PHYSMASK7:
 112.225 +        if ( !mtrr )
 112.226 +            goto gp_fault;
 112.227          if ( !mtrr_var_range_msr_set(&v->arch.hvm_vcpu.mtrr,
 112.228                                       regs->ecx, msr_content) )
 112.229              goto gp_fault;
 112.230 @@ -2360,6 +2353,54 @@ long do_hvm_op(unsigned long op, XEN_GUE
 112.231          rc = guest_handle_is_null(arg) ? hvmop_flush_tlb_all() : -ENOSYS;
 112.232          break;
 112.233  
 112.234 +    case HVMOP_track_dirty_vram:
 112.235 +    {
 112.236 +        struct xen_hvm_track_dirty_vram a;
 112.237 +        struct domain *d;
 112.238 +
 112.239 +        if ( copy_from_guest(&a, arg, 1) )
 112.240 +            return -EFAULT;
 112.241 +
 112.242 +        if ( a.domid == DOMID_SELF )
 112.243 +        {
 112.244 +            d = rcu_lock_current_domain();
 112.245 +        }
 112.246 +        else
 112.247 +        {
 112.248 +            if ( (d = rcu_lock_domain_by_id(a.domid)) == NULL )
 112.249 +                return -ESRCH;
 112.250 +            if ( !IS_PRIV_FOR(current->domain, d) )
 112.251 +            {
 112.252 +                rc = -EPERM;
 112.253 +                goto param_fail2;
 112.254 +            }
 112.255 +        }
 112.256 +
 112.257 +        rc = -EINVAL;
 112.258 +        if ( !is_hvm_domain(d) )
 112.259 +            goto param_fail2;
 112.260 +
 112.261 +        rc = xsm_hvm_param(d, op);
 112.262 +        if ( rc )
 112.263 +            goto param_fail2;
 112.264 +
 112.265 +        rc = -ESRCH;
 112.266 +        if ( d->is_dying )
 112.267 +            goto param_fail2;
 112.268 +
 112.269 +        rc = -EINVAL;
 112.270 +        if ( !shadow_mode_enabled(d))
 112.271 +            goto param_fail2;
 112.272 +        if ( d->vcpu[0] == NULL )
 112.273 +            goto param_fail2;
 112.274 +
 112.275 +        rc = shadow_track_dirty_vram(d, a.first_pfn, a.nr, a.dirty_bitmap);
 112.276 +
 112.277 +    param_fail2:
 112.278 +        rcu_unlock_domain(d);
 112.279 +        break;
 112.280 +    }
 112.281 +
 112.282      default:
 112.283      {
 112.284          gdprintk(XENLOG_WARNING, "Bad HVM op %ld.\n", op);
   113.1 --- a/xen/arch/x86/hvm/i8254.c	Fri Apr 25 20:13:52 2008 +0900
   113.2 +++ b/xen/arch/x86/hvm/i8254.c	Thu May 08 18:40:07 2008 +0900
   113.3 @@ -206,19 +206,21 @@ static void pit_load_count(PITState *pit
   113.4  
   113.5      switch ( s->mode )
   113.6      {
   113.7 -        case 2:
   113.8 -            /* Periodic timer. */
   113.9 -            create_periodic_time(v, &pit->pt0, period, 0, 0, pit_time_fired, 
  113.10 -                                 &pit->count_load_time[channel]);
  113.11 -            break;
  113.12 -        case 1:
  113.13 -            /* One-shot timer. */
  113.14 -            create_periodic_time(v, &pit->pt0, period, 0, 1, pit_time_fired,
  113.15 -                                 &pit->count_load_time[channel]);
  113.16 -            break;
  113.17 -        default:
  113.18 -            destroy_periodic_time(&pit->pt0);
  113.19 -            break;
  113.20 +    case 2:
  113.21 +    case 3:
  113.22 +        /* Periodic timer. */
  113.23 +        create_periodic_time(v, &pit->pt0, period, 0, 0, pit_time_fired, 
  113.24 +                             &pit->count_load_time[channel]);
  113.25 +        break;
  113.26 +    case 1:
  113.27 +    case 4:
  113.28 +        /* One-shot timer. */
  113.29 +        create_periodic_time(v, &pit->pt0, period, 0, 1, pit_time_fired,
  113.30 +                             &pit->count_load_time[channel]);
  113.31 +        break;
  113.32 +    default:
  113.33 +        destroy_periodic_time(&pit->pt0);
  113.34 +        break;
  113.35      }
  113.36  }
  113.37  
   114.1 --- a/xen/arch/x86/hvm/stdvga.c	Fri Apr 25 20:13:52 2008 +0900
   114.2 +++ b/xen/arch/x86/hvm/stdvga.c	Thu May 08 18:40:07 2008 +0900
   114.3 @@ -131,14 +131,15 @@ static int stdvga_outb(uint64_t addr, ui
   114.4  
   114.5      /* When in standard vga mode, emulate here all writes to the vram buffer
   114.6       * so we can immediately satisfy reads without waiting for qemu. */
   114.7 -    s->stdvga =
   114.8 -        (s->sr[7] == 0x00) &&  /* standard vga mode */
   114.9 -        (s->gr[6] == 0x05);    /* misc graphics register w/ MemoryMapSelect=1
  114.10 -                                * 0xa0000-0xaffff (64k region), AlphaDis=1 */
  114.11 +    s->stdvga = (s->sr[7] == 0x00);
  114.12  
  114.13      if ( !prev_stdvga && s->stdvga )
  114.14      {
  114.15 -        s->cache = 1;       /* (re)start caching video buffer */
  114.16 +        /*
  114.17 +         * (Re)start caching of video buffer.
  114.18 +         * XXX TODO: In case of a restart the cache could be unsynced.
  114.19 +         */
  114.20 +        s->cache = 1;
  114.21          gdprintk(XENLOG_INFO, "entering stdvga and caching modes\n");
  114.22      }
  114.23      else if ( prev_stdvga && !s->stdvga )
  114.24 @@ -182,6 +183,40 @@ static int stdvga_intercept_pio(
  114.25      return X86EMUL_UNHANDLEABLE; /* propagate to external ioemu */
  114.26  }
  114.27  
  114.28 +static unsigned int stdvga_mem_offset(
  114.29 +    struct hvm_hw_stdvga *s, unsigned int mmio_addr)
  114.30 +{
  114.31 +    unsigned int memory_map_mode = (s->gr[6] >> 2) & 3;
  114.32 +    unsigned int offset = mmio_addr & 0x1ffff;
  114.33 +
  114.34 +    switch ( memory_map_mode )
  114.35 +    {
  114.36 +    case 0:
  114.37 +        break;
  114.38 +    case 1:
  114.39 +        if ( offset >= 0x10000 )
  114.40 +            goto fail;
  114.41 +        offset += 0; /* assume bank_offset == 0; */
  114.42 +        break;
  114.43 +    case 2:
  114.44 +        offset -= 0x10000;
  114.45 +        if ( offset >= 0x8000 )
  114.46 +            goto fail;
  114.47 +        break;
  114.48 +    default:
  114.49 +    case 3:
  114.50 +        offset -= 0x18000;
  114.51 +        if ( offset >= 0x8000 )
  114.52 +            goto fail;
  114.53 +        break;
  114.54 +    }
  114.55 +
  114.56 +    return offset;
  114.57 +
  114.58 + fail:
  114.59 +    return ~0u;
  114.60 +}
  114.61 +
  114.62  #define GET_PLANE(data, p) (((data) >> ((p) * 8)) & 0xff)
  114.63  
  114.64  static uint8_t stdvga_mem_readb(uint64_t addr)
  114.65 @@ -191,8 +226,8 @@ static uint8_t stdvga_mem_readb(uint64_t
  114.66      uint32_t ret, *vram_l;
  114.67      uint8_t *vram_b;
  114.68  
  114.69 -    addr &= 0x1ffff;
  114.70 -    if ( addr >= 0x10000 )
  114.71 +    addr = stdvga_mem_offset(s, addr);
  114.72 +    if ( addr == ~0u )
  114.73          return 0xff;
  114.74  
  114.75      if ( s->sr[4] & 0x08 )
  114.76 @@ -273,8 +308,8 @@ static void stdvga_mem_writeb(uint64_t a
  114.77      uint32_t write_mask, bit_mask, set_mask, *vram_l;
  114.78      uint8_t *vram_b;
  114.79  
  114.80 -    addr &= 0x1ffff;
  114.81 -    if ( addr >= 0x10000 )
  114.82 +    addr = stdvga_mem_offset(s, addr);
  114.83 +    if ( addr == ~0u )
  114.84          return;
  114.85  
  114.86      if ( s->sr[4] & 0x08 )
  114.87 @@ -531,7 +566,7 @@ void stdvga_init(struct domain *d)
  114.88          register_portio_handler(d, 0x3ce, 2, stdvga_intercept_pio);
  114.89          /* MMIO. */
  114.90          register_buffered_io_handler(
  114.91 -            d, 0xa0000, 0x10000, stdvga_intercept_mmio);
  114.92 +            d, 0xa0000, 0x20000, stdvga_intercept_mmio);
  114.93      }
  114.94  }
  114.95  
   115.1 --- a/xen/arch/x86/hvm/svm/emulate.c	Fri Apr 25 20:13:52 2008 +0900
   115.2 +++ b/xen/arch/x86/hvm/svm/emulate.c	Thu May 08 18:40:07 2008 +0900
   115.3 @@ -29,18 +29,6 @@
   115.4  
   115.5  #define MAX_INST_LEN 15
   115.6  
   115.7 -static int inst_copy_from_guest(
   115.8 -    unsigned char *buf, unsigned long guest_eip, int inst_len)
   115.9 -{
  115.10 -    struct vmcb_struct *vmcb = current->arch.hvm_svm.vmcb;
  115.11 -    uint32_t pfec = (vmcb->cpl == 3) ? PFEC_user_mode : 0;
  115.12 -    if ( (inst_len > MAX_INST_LEN) || (inst_len <= 0) )
  115.13 -        return 0;
  115.14 -    if ( hvm_fetch_from_guest_virt_nofault(buf, guest_eip, inst_len, pfec) )
  115.15 -        return 0;
  115.16 -    return inst_len;
  115.17 -}
  115.18 -
  115.19  static unsigned int is_prefix(u8 opc)
  115.20  {
  115.21      switch ( opc )
  115.22 @@ -73,12 +61,7 @@ static unsigned long svm_rip2pointer(str
  115.23      return p;
  115.24  }
  115.25  
  115.26 -/* 
  115.27 - * Here's how it works:
  115.28 - * First byte: Length. 
  115.29 - * Following bytes: Opcode bytes. 
  115.30 - * Special case: Last byte, if zero, doesn't need to match. 
  115.31 - */
  115.32 +/* First byte: Length. Following bytes: Opcode bytes. */
  115.33  #define MAKE_INSTR(nm, ...) static const u8 OPCODE_##nm[] = { __VA_ARGS__ }
  115.34  MAKE_INSTR(INVD,   2, 0x0f, 0x08);
  115.35  MAKE_INSTR(WBINVD, 2, 0x0f, 0x09);
  115.36 @@ -101,70 +84,90 @@ static const u8 *opc_bytes[INSTR_MAX_COU
  115.37      [INSTR_INT3]   = OPCODE_INT3
  115.38  };
  115.39  
  115.40 +static int fetch(struct vcpu *v, u8 *buf, unsigned long addr, int len)
  115.41 +{
  115.42 +    uint32_t pfec = (v->arch.hvm_svm.vmcb->cpl == 3) ? PFEC_user_mode : 0;
  115.43 +
  115.44 +    switch ( hvm_fetch_from_guest_virt(buf, addr, len, pfec) )
  115.45 +    {
  115.46 +    case HVMCOPY_okay:
  115.47 +        return 1;
  115.48 +    case HVMCOPY_bad_gva_to_gfn:
  115.49 +        /* OK just to give up; we'll have injected #PF already */
  115.50 +        return 0;
  115.51 +    case HVMCOPY_bad_gfn_to_mfn:
  115.52 +    default:
  115.53 +        /* Not OK: fetches from non-RAM pages are not supportable. */
  115.54 +        gdprintk(XENLOG_WARNING, "Bad instruction fetch at %#lx (%#lx)\n",
  115.55 +                 (unsigned long) guest_cpu_user_regs()->eip, addr);
  115.56 +        hvm_inject_exception(TRAP_gp_fault, 0, 0);
  115.57 +        return 0;
  115.58 +    }
  115.59 +}
  115.60 +
  115.61  int __get_instruction_length_from_list(struct vcpu *v,
  115.62 -        enum instruction_index *list, unsigned int list_count, 
  115.63 -        u8 *guest_eip_buf, enum instruction_index *match)
  115.64 +        enum instruction_index *list, unsigned int list_count)
  115.65  {
  115.66      struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
  115.67      unsigned int i, j, inst_len = 0;
  115.68 -    int found = 0;
  115.69      enum instruction_index instr = 0;
  115.70 -    u8 buffer[MAX_INST_LEN];
  115.71 -    u8 *buf;
  115.72 +    u8 buf[MAX_INST_LEN];
  115.73      const u8 *opcode = NULL;
  115.74 +    unsigned long fetch_addr;
  115.75 +    unsigned int fetch_len;
  115.76  
  115.77 -    if ( guest_eip_buf )
  115.78 +    /* Fetch up to the next page break; we'll fetch from the next page
  115.79 +     * later if we have to. */
  115.80 +    fetch_addr = svm_rip2pointer(v);
  115.81 +    fetch_len = min_t(unsigned int, MAX_INST_LEN,
  115.82 +                      PAGE_SIZE - (fetch_addr & ~PAGE_MASK));
  115.83 +    if ( !fetch(v, buf, fetch_addr, fetch_len) )
  115.84 +        return 0;
  115.85 +
  115.86 +    while ( (inst_len < MAX_INST_LEN) && is_prefix(buf[inst_len]) )
  115.87      {
  115.88 -        buf = guest_eip_buf;
  115.89 -    }
  115.90 -    else
  115.91 -    {
  115.92 -        if ( inst_copy_from_guest(buffer, svm_rip2pointer(v), MAX_INST_LEN)
  115.93 -             != MAX_INST_LEN )
  115.94 -            return 0;
  115.95 -        buf = buffer;
  115.96 +        inst_len++;
  115.97 +        if ( inst_len >= fetch_len )
  115.98 +        {
  115.99 +            if ( !fetch(v, buf + fetch_len, fetch_addr + fetch_len,
 115.100 +                        MAX_INST_LEN - fetch_len) )
 115.101 +                return 0;
 115.102 +            fetch_len = MAX_INST_LEN;
 115.103 +        }
 115.104      }
 115.105  
 115.106      for ( j = 0; j < list_count; j++ )
 115.107      {
 115.108          instr = list[j];
 115.109          opcode = opc_bytes[instr];
 115.110 -        ASSERT(opcode);
 115.111  
 115.112 -        while ( (inst_len < MAX_INST_LEN) && 
 115.113 -                is_prefix(buf[inst_len]) && 
 115.114 -                !is_prefix(opcode[1]) )
 115.115 -            inst_len++;
 115.116 -
 115.117 -        ASSERT(opcode[0] <= 15);    /* Make sure the table is correct. */
 115.118 -        found = 1;
 115.119 -
 115.120 -        for ( i = 0; i < opcode[0]; i++ )
 115.121 +        for ( i = 0; (i < opcode[0]) && ((inst_len + i) < MAX_INST_LEN); i++ )
 115.122          {
 115.123 -            /* If the last byte is zero, we just accept it without checking */
 115.124 -            if ( (i == (opcode[0]-1)) && (opcode[i+1] == 0) )
 115.125 -                break;
 115.126 +            if ( (inst_len + i) >= fetch_len ) 
 115.127 +            { 
 115.128 +                if ( !fetch(v, buf + fetch_len, 
 115.129 +                            fetch_addr + fetch_len, 
 115.130 +                            MAX_INST_LEN - fetch_len) ) 
 115.131 +                    return 0;
 115.132 +                fetch_len = MAX_INST_LEN;
 115.133 +            }
 115.134  
 115.135              if ( buf[inst_len+i] != opcode[i+1] )
 115.136 -            {
 115.137 -                found = 0;
 115.138 -                break;
 115.139 -            }
 115.140 +                goto mismatch;
 115.141          }
 115.142 -
 115.143 -        if ( found )
 115.144 -            goto done;
 115.145 +        goto done;
 115.146 +    mismatch: ;
 115.147      }
 115.148  
 115.149 -    printk("%s: Mismatch between expected and actual instruction bytes: "
 115.150 -            "eip = %lx\n",  __func__, (unsigned long)vmcb->rip);
 115.151 +    gdprintk(XENLOG_WARNING,
 115.152 +             "%s: Mismatch between expected and actual instruction bytes: "
 115.153 +             "eip = %lx\n",  __func__, (unsigned long)vmcb->rip);
 115.154 +    hvm_inject_exception(TRAP_gp_fault, 0, 0);
 115.155      return 0;
 115.156  
 115.157   done:
 115.158      inst_len += opcode[0];
 115.159      ASSERT(inst_len <= MAX_INST_LEN);
 115.160 -    if ( match )
 115.161 -        *match = instr;
 115.162      return inst_len;
 115.163  }
 115.164  
   116.1 --- a/xen/arch/x86/hvm/svm/svm.c	Fri Apr 25 20:13:52 2008 +0900
   116.2 +++ b/xen/arch/x86/hvm/svm/svm.c	Thu May 08 18:40:07 2008 +0900
   116.3 @@ -84,7 +84,10 @@ static void inline __update_guest_eip(
   116.4  {
   116.5      struct vcpu *curr = current;
   116.6  
   116.7 -    if ( unlikely((inst_len == 0) || (inst_len > 15)) )
   116.8 +    if ( unlikely(inst_len == 0) )
   116.9 +        return;
  116.10 +
  116.11 +    if ( unlikely(inst_len > 15) )
  116.12      {
  116.13          gdprintk(XENLOG_ERR, "Bad instruction length %u\n", inst_len);
  116.14          domain_crash(curr->domain);
  116.15 @@ -892,56 +895,11 @@ static void svm_cpuid_intercept(
  116.16  
  116.17      hvm_cpuid(input, eax, ebx, ecx, edx);
  116.18  
  116.19 -    switch ( input )
  116.20 +    if ( input == 0x80000001 )
  116.21      {
  116.22 -    case 0x00000001:
  116.23 -        /* Mask Intel-only features. */
  116.24 -        *ecx &= ~(bitmaskof(X86_FEATURE_SSSE3) |
  116.25 -                  bitmaskof(X86_FEATURE_SSE4_1) |
  116.26 -                  bitmaskof(X86_FEATURE_SSE4_2));
  116.27 -        break;
  116.28 -
  116.29 -    case 0x80000001:
  116.30 -        /* Filter features which are shared with 0x00000001:EDX. */
  116.31 +        /* Fix up VLAPIC details. */
  116.32          if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
  116.33              __clear_bit(X86_FEATURE_APIC & 31, edx);
  116.34 -#if CONFIG_PAGING_LEVELS >= 3
  116.35 -        if ( !v->domain->arch.hvm_domain.params[HVM_PARAM_PAE_ENABLED] )
  116.36 -#endif
  116.37 -            __clear_bit(X86_FEATURE_PAE & 31, edx);
  116.38 -        __clear_bit(X86_FEATURE_PSE36 & 31, edx);
  116.39 -
  116.40 -        /* We always support MTRR MSRs. */
  116.41 -        *edx |= bitmaskof(X86_FEATURE_MTRR);
  116.42 -
  116.43 -        /* Filter all other features according to a whitelist. */
  116.44 -        *ecx &= (bitmaskof(X86_FEATURE_LAHF_LM) |
  116.45 -                 bitmaskof(X86_FEATURE_ALTMOVCR) |
  116.46 -                 bitmaskof(X86_FEATURE_ABM) |
  116.47 -                 bitmaskof(X86_FEATURE_SSE4A) |
  116.48 -                 bitmaskof(X86_FEATURE_MISALIGNSSE) |
  116.49 -                 bitmaskof(X86_FEATURE_3DNOWPF));
  116.50 -        *edx &= (0x0183f3ff | /* features shared with 0x00000001:EDX */
  116.51 -                 bitmaskof(X86_FEATURE_NX) |
  116.52 -                 bitmaskof(X86_FEATURE_LM) |
  116.53 -                 bitmaskof(X86_FEATURE_SYSCALL) |
  116.54 -                 bitmaskof(X86_FEATURE_MP) |
  116.55 -                 bitmaskof(X86_FEATURE_MMXEXT) |
  116.56 -                 bitmaskof(X86_FEATURE_FFXSR) |
  116.57 -                 bitmaskof(X86_FEATURE_3DNOW) |
  116.58 -                 bitmaskof(X86_FEATURE_3DNOWEXT));
  116.59 -        break;
  116.60 -
  116.61 -    case 0x80000007:
  116.62 -    case 0x8000000A:
  116.63 -        /* Mask out features of power management and SVM extension. */
  116.64 -        *eax = *ebx = *ecx = *edx = 0;
  116.65 -        break;
  116.66 -
  116.67 -    case 0x80000008:
  116.68 -        /* Make sure Number of CPU core is 1 when HTT=0 */
  116.69 -        *ecx &= 0xFFFFFF00;
  116.70 -        break;
  116.71      }
  116.72  
  116.73      HVMTRACE_3D(CPUID, v, input,
  116.74 @@ -952,8 +910,7 @@ static void svm_vmexit_do_cpuid(struct c
  116.75  {
  116.76      unsigned int eax, ebx, ecx, edx, inst_len;
  116.77  
  116.78 -    inst_len = __get_instruction_length(current, INSTR_CPUID, NULL);
  116.79 -    if ( inst_len == 0 ) 
  116.80 +    if ( (inst_len = __get_instruction_length(current, INSTR_CPUID)) == 0 )
  116.81          return;
  116.82  
  116.83      eax = regs->eax;
  116.84 @@ -1128,13 +1085,15 @@ static void svm_do_msr_access(struct cpu
  116.85  
  116.86      if ( vmcb->exitinfo1 == 0 )
  116.87      {
  116.88 +        if ( (inst_len = __get_instruction_length(v, INSTR_RDMSR)) == 0 )
  116.89 +            return;
  116.90          rc = hvm_msr_read_intercept(regs);
  116.91 -        inst_len = __get_instruction_length(v, INSTR_RDMSR, NULL);
  116.92      }
  116.93      else
  116.94      {
  116.95 +        if ( (inst_len = __get_instruction_length(v, INSTR_WRMSR)) == 0 )
  116.96 +            return;
  116.97          rc = hvm_msr_write_intercept(regs);
  116.98 -        inst_len = __get_instruction_length(v, INSTR_WRMSR, NULL);
  116.99      }
 116.100  
 116.101      if ( rc == X86EMUL_OKAY )
 116.102 @@ -1144,25 +1103,12 @@ static void svm_do_msr_access(struct cpu
 116.103  static void svm_vmexit_do_hlt(struct vmcb_struct *vmcb,
 116.104                                struct cpu_user_regs *regs)
 116.105  {
 116.106 -    struct vcpu *curr = current;
 116.107 -    struct hvm_intack intack = hvm_vcpu_has_pending_irq(curr);
 116.108      unsigned int inst_len;
 116.109  
 116.110 -    inst_len = __get_instruction_length(curr, INSTR_HLT, NULL);
 116.111 -    if ( inst_len == 0 )
 116.112 +    if ( (inst_len = __get_instruction_length(current, INSTR_HLT)) == 0 )
 116.113          return;
 116.114      __update_guest_eip(regs, inst_len);
 116.115  
 116.116 -    /* Check for pending exception or new interrupt. */
 116.117 -    if ( vmcb->eventinj.fields.v ||
 116.118 -         ((intack.source != hvm_intsrc_none) &&
 116.119 -          !hvm_interrupt_blocked(current, intack)) )
 116.120 -    {
 116.121 -        HVMTRACE_1D(HLT, curr, /*int pending=*/ 1);
 116.122 -        return;
 116.123 -    }
 116.124 -
 116.125 -    HVMTRACE_1D(HLT, curr, /*int pending=*/ 0);
 116.126      hvm_hlt(regs->eflags);
 116.127  }
 116.128  
 116.129 @@ -1182,10 +1128,13 @@ static void svm_vmexit_do_invalidate_cac
 116.130      enum instruction_index list[] = { INSTR_INVD, INSTR_WBINVD };
 116.131      int inst_len;
 116.132  
 116.133 +    inst_len = __get_instruction_length_from_list(
 116.134 +        current, list, ARRAY_SIZE(list));
 116.135 +    if ( inst_len == 0 )
 116.136 +        return;
 116.137 +
 116.138      svm_wbinvd_intercept();
 116.139  
 116.140 -    inst_len = __get_instruction_length_from_list(
 116.141 -        current, list, ARRAY_SIZE(list), NULL, NULL);
 116.142      __update_guest_eip(regs, inst_len);
 116.143  }
 116.144  
 116.145 @@ -1261,7 +1210,8 @@ asmlinkage void svm_vmexit_handler(struc
 116.146          if ( !v->domain->debugger_attached )
 116.147              goto exit_and_crash;
 116.148          /* AMD Vol2, 15.11: INT3, INTO, BOUND intercepts do not update RIP. */
 116.149 -        inst_len = __get_instruction_length(v, INSTR_INT3, NULL);
 116.150 +        if ( (inst_len = __get_instruction_length(v, INSTR_INT3)) == 0 )
 116.151 +            break;
 116.152          __update_guest_eip(regs, inst_len);
 116.153          domain_pause_for_debugger();
 116.154          break;
 116.155 @@ -1338,8 +1288,7 @@ asmlinkage void svm_vmexit_handler(struc
 116.156          break;
 116.157  
 116.158      case VMEXIT_VMMCALL:
 116.159 -        inst_len = __get_instruction_length(v, INSTR_VMCALL, NULL);
 116.160 -        if ( inst_len == 0 )
 116.161 +        if ( (inst_len = __get_instruction_length(v, INSTR_VMCALL)) == 0 )
 116.162              break;
 116.163          HVMTRACE_1D(VMMCALL, v, regs->eax);
 116.164          rc = hvm_do_hypercall(regs);
   117.1 --- a/xen/arch/x86/hvm/vlapic.c	Fri Apr 25 20:13:52 2008 +0900
   117.2 +++ b/xen/arch/x86/hvm/vlapic.c	Thu May 08 18:40:07 2008 +0900
   117.3 @@ -171,7 +171,7 @@ int vlapic_match_logical_addr(struct vla
   117.4      int result = 0;
   117.5      uint8_t logical_id;
   117.6  
   117.7 -    logical_id = GET_APIC_LOGICAL_ID(vlapic_get_reg(vlapic, APIC_LDR));
   117.8 +    logical_id = GET_xAPIC_LOGICAL_ID(vlapic_get_reg(vlapic, APIC_LDR));
   117.9  
  117.10      switch ( vlapic_get_reg(vlapic, APIC_DFR) )
  117.11      {
  117.12 @@ -476,12 +476,15 @@ void vlapic_EOI_set(struct vlapic *vlapi
  117.13  
  117.14      if ( vlapic_test_and_clear_vector(vector, &vlapic->regs->data[APIC_TMR]) )
  117.15          vioapic_update_EOI(vlapic_domain(vlapic), vector);
  117.16 +	
  117.17 +    if ( vtd_enabled )
  117.18 +        hvm_dpci_msi_eoi(current->domain, vector);
  117.19  }
  117.20  
  117.21  static int vlapic_ipi(
  117.22      struct vlapic *vlapic, uint32_t icr_low, uint32_t icr_high)
  117.23  {
  117.24 -    unsigned int dest =         GET_APIC_DEST_FIELD(icr_high);
  117.25 +    unsigned int dest =         GET_xAPIC_DEST_FIELD(icr_high);
  117.26      unsigned int short_hand =   icr_low & APIC_SHORT_MASK;
  117.27      unsigned int trig_mode =    icr_low & APIC_INT_LEVELTRIG;
  117.28      unsigned int level =        icr_low & APIC_INT_ASSERT;
   118.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
   118.2 +++ b/xen/arch/x86/hvm/vmsi.c	Thu May 08 18:40:07 2008 +0900
   118.3 @@ -0,0 +1,189 @@
   118.4 +/*
   118.5 + *  Copyright (C) 2001  MandrakeSoft S.A.
   118.6 + *
   118.7 + *    MandrakeSoft S.A.
   118.8 + *    43, rue d'Aboukir
   118.9 + *    75002 Paris - France
  118.10 + *    http://www.linux-mandrake.com/
  118.11 + *    http://www.mandrakesoft.com/
  118.12 + *
  118.13 + *  This library is free software; you can redistribute it and/or
  118.14 + *  modify it under the terms of the GNU Lesser General Public
  118.15 + *  License as published by the Free Software Foundation; either
  118.16 + *  version 2 of the License, or (at your option) any later version.
  118.17 + *
  118.18 + *  This library is distributed in the hope that it will be useful,
  118.19 + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
  118.20 + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  118.21 + *  Lesser General Public License for more details.
  118.22 + *
  118.23 + *  You should have received a copy of the GNU Lesser General Public
  118.24 + *  License along with this library; if not, write to the Free Software
  118.25 + *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  118.26 + *
  118.27 + * Support for virtual MSI logic
  118.28 + * Will be merged it with virtual IOAPIC logic, since most is the same
  118.29 +*/
  118.30 +
  118.31 +#include <xen/config.h>
  118.32 +#include <xen/types.h>
  118.33 +#include <xen/mm.h>
  118.34 +#include <xen/xmalloc.h>
  118.35 +#include <xen/lib.h>
  118.36 +#include <xen/errno.h>
  118.37 +#include <xen/sched.h>
  118.38 +#include <public/hvm/ioreq.h>
  118.39 +#include <asm/hvm/io.h>
  118.40 +#include <asm/hvm/vpic.h>
  118.41 +#include <asm/hvm/vlapic.h>
  118.42 +#include <asm/hvm/support.h>
  118.43 +#include <asm/current.h>
  118.44 +#include <asm/event.h>
  118.45 +
  118.46 +static uint32_t vmsi_get_delivery_bitmask(
  118.47 +    struct domain *d, uint16_t dest, uint8_t dest_mode)
  118.48 +{