ia64/linux-2.6.18-xen.hg

annotate Documentation/vm/page_migration @ 854:950b9eb27661

usbback: fix urb interval value for interrupt urbs.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
author Keir Fraser <keir.fraser@citrix.com>
date Mon Apr 06 13:51:20 2009 +0100 (2009-04-06)
parents 831230e53067
children
rev   line source
ian@0 1 Page migration
ian@0 2 --------------
ian@0 3
ian@0 4 Page migration allows the moving of the physical location of pages between
ian@0 5 nodes in a numa system while the process is running. This means that the
ian@0 6 virtual addresses that the process sees do not change. However, the
ian@0 7 system rearranges the physical location of those pages.
ian@0 8
ian@0 9 The main intend of page migration is to reduce the latency of memory access
ian@0 10 by moving pages near to the processor where the process accessing that memory
ian@0 11 is running.
ian@0 12
ian@0 13 Page migration allows a process to manually relocate the node on which its
ian@0 14 pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
ian@0 15 a new memory policy via mbind(). The pages of process can also be relocated
ian@0 16 from another process using the sys_migrate_pages() function call. The
ian@0 17 migrate_pages function call takes two sets of nodes and moves pages of a
ian@0 18 process that are located on the from nodes to the destination nodes.
ian@0 19 Page migration functions are provided by the numactl package by Andi Kleen
ian@0 20 (a version later than 0.9.3 is required. Get it from
ian@0 21 ftp://ftp.suse.com/pub/people/ak). numactl provided libnuma which
ian@0 22 provides an interface similar to other numa functionality for page migration.
ian@0 23 cat /proc/<pid>/numa_maps allows an easy review of where the pages of
ian@0 24 a process are located. See also the numa_maps manpage in the numactl package.
ian@0 25
ian@0 26 Manual migration is useful if for example the scheduler has relocated
ian@0 27 a process to a processor on a distant node. A batch scheduler or an
ian@0 28 administrator may detect the situation and move the pages of the process
ian@0 29 nearer to the new processor. The kernel itself does only provide
ian@0 30 manual page migration support. Automatic page migration may be implemented
ian@0 31 through user space processes that move pages. A special function call
ian@0 32 "move_pages" allows the moving of individual pages within a process.
ian@0 33 A NUMA profiler may f.e. obtain a log showing frequent off node
ian@0 34 accesses and may use the result to move pages to more advantageous
ian@0 35 locations.
ian@0 36
ian@0 37 Larger installations usually partition the system using cpusets into
ian@0 38 sections of nodes. Paul Jackson has equipped cpusets with the ability to
ian@0 39 move pages when a task is moved to another cpuset (See ../cpusets.txt).
ian@0 40 Cpusets allows the automation of process locality. If a task is moved to
ian@0 41 a new cpuset then also all its pages are moved with it so that the
ian@0 42 performance of the process does not sink dramatically. Also the pages
ian@0 43 of processes in a cpuset are moved if the allowed memory nodes of a
ian@0 44 cpuset are changed.
ian@0 45
ian@0 46 Page migration allows the preservation of the relative location of pages
ian@0 47 within a group of nodes for all migration techniques which will preserve a
ian@0 48 particular memory allocation pattern generated even after migrating a
ian@0 49 process. This is necessary in order to preserve the memory latencies.
ian@0 50 Processes will run with similar performance after migration.
ian@0 51
ian@0 52 Page migration occurs in several steps. First a high level
ian@0 53 description for those trying to use migrate_pages() from the kernel
ian@0 54 (for userspace usage see the Andi Kleen's numactl package mentioned above)
ian@0 55 and then a low level description of how the low level details work.
ian@0 56
ian@0 57 A. In kernel use of migrate_pages()
ian@0 58 -----------------------------------
ian@0 59
ian@0 60 1. Remove pages from the LRU.
ian@0 61
ian@0 62 Lists of pages to be migrated are generated by scanning over
ian@0 63 pages and moving them into lists. This is done by
ian@0 64 calling isolate_lru_page().
ian@0 65 Calling isolate_lru_page increases the references to the page
ian@0 66 so that it cannot vanish while the page migration occurs.
ian@0 67 It also prevents the swapper or other scans to encounter
ian@0 68 the page.
ian@0 69
ian@0 70 2. We need to have a function of type new_page_t that can be
ian@0 71 passed to migrate_pages(). This function should figure out
ian@0 72 how to allocate the correct new page given the old page.
ian@0 73
ian@0 74 3. The migrate_pages() function is called which attempts
ian@0 75 to do the migration. It will call the function to allocate
ian@0 76 the new page for each page that is considered for
ian@0 77 moving.
ian@0 78
ian@0 79 B. How migrate_pages() works
ian@0 80 ----------------------------
ian@0 81
ian@0 82 migrate_pages() does several passes over its list of pages. A page is moved
ian@0 83 if all references to a page are removable at the time. The page has
ian@0 84 already been removed from the LRU via isolate_lru_page() and the refcount
ian@0 85 is increased so that the page cannot be freed while page migration occurs.
ian@0 86
ian@0 87 Steps:
ian@0 88
ian@0 89 1. Lock the page to be migrated
ian@0 90
ian@0 91 2. Insure that writeback is complete.
ian@0 92
ian@0 93 3. Prep the new page that we want to move to. It is locked
ian@0 94 and set to not being uptodate so that all accesses to the new
ian@0 95 page immediately lock while the move is in progress.
ian@0 96
ian@0 97 4. The new page is prepped with some settings from the old page so that
ian@0 98 accesses to the new page will discover a page with the correct settings.
ian@0 99
ian@0 100 5. All the page table references to the page are converted
ian@0 101 to migration entries or dropped (nonlinear vmas).
ian@0 102 This decrease the mapcount of a page. If the resulting
ian@0 103 mapcount is not zero then we do not migrate the page.
ian@0 104 All user space processes that attempt to access the page
ian@0 105 will now wait on the page lock.
ian@0 106
ian@0 107 6. The radix tree lock is taken. This will cause all processes trying
ian@0 108 to access the page via the mapping to block on the radix tree spinlock.
ian@0 109
ian@0 110 7. The refcount of the page is examined and we back out if references remain
ian@0 111 otherwise we know that we are the only one referencing this page.
ian@0 112
ian@0 113 8. The radix tree is checked and if it does not contain the pointer to this
ian@0 114 page then we back out because someone else modified the radix tree.
ian@0 115
ian@0 116 9. The radix tree is changed to point to the new page.
ian@0 117
ian@0 118 10. The reference count of the old page is dropped because the radix tree
ian@0 119 reference is gone. A reference to the new page is established because
ian@0 120 the new page is referenced to by the radix tree.
ian@0 121
ian@0 122 11. The radix tree lock is dropped. With that lookups in the mapping
ian@0 123 become possible again. Processes will move from spinning on the tree_lock
ian@0 124 to sleeping on the locked new page.
ian@0 125
ian@0 126 12. The page contents are copied to the new page.
ian@0 127
ian@0 128 13. The remaining page flags are copied to the new page.
ian@0 129
ian@0 130 14. The old page flags are cleared to indicate that the page does
ian@0 131 not provide any information anymore.
ian@0 132
ian@0 133 15. Queued up writeback on the new page is triggered.
ian@0 134
ian@0 135 16. If migration entries were page then replace them with real ptes. Doing
ian@0 136 so will enable access for user space processes not already waiting for
ian@0 137 the page lock.
ian@0 138
ian@0 139 19. The page locks are dropped from the old and new page.
ian@0 140 Processes waiting on the page lock will redo their page faults
ian@0 141 and will reach the new page.
ian@0 142
ian@0 143 20. The new page is moved to the LRU and can be scanned by the swapper
ian@0 144 etc again.
ian@0 145
ian@0 146 Christoph Lameter, May 8, 2006.
ian@0 147