ia64/xen-unstable

view xen/arch/ia64/xen/mm.c @ 18394:dade7f0bdc8d

hvm: Use main memory for video memory.

When creating an HVM domain, if e.g. another domain is created before
qemu allocates video memory, the extra 8MB memory ballooning is not
available any more, because it got consumed by the other domain.

This fixes it by taking video memory from the main memory:

- make hvmloader use e820_malloc to reserve some of the main memory
and notify ioemu of its address through the Xen platform PCI card.
- add XENMAPSPACE_mfn to the xen_add_to_physmap memory op, to allow
ioemu to move the MFNs between the original position and the PCI
mapping, when LFB acceleration is disabled/enabled
- add a remove_from_physmap memory op, to allow ioemu to unmap it
completely for the case of old guests with acceleration disabled.
- add xc_domain_memory_translate_gpfn_list to libxc to allow ioemu to
get the MFNs of the video memory.
- have xend save the PCI memory space instead of ioemu: if a memory
page is there, the guest can access it like usual memory, so xend
can safely be responsible to save it. The extra benefit is that
live migration will apply the logdirty optimization there too.
- handle old saved images, populating the video memory from ioemu if
really needed.

Signed-off-by: Samuel Thibault <samuel.thibault@eu.citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Wed Aug 27 14:53:39 2008 +0100 (2008-08-27)
parents 926a366ca82f
children 4ddd63b4be9b
line source
1 /*
2 * Copyright (C) 2005 Intel Co
3 * Kun Tian (Kevin Tian) <kevin.tian@intel.com>
4 *
5 * 05/04/29 Kun Tian (Kevin Tian) <kevin.tian@intel.com> Add VTI domain support
6 *
7 * Copyright (c) 2006 Isaku Yamahata <yamahata at valinux co jp>
8 * VA Linux Systems Japan K.K.
9 * dom0 vp model support
10 */
12 /*
13 * NOTES on SMP
14 *
15 * * shared structures
16 * There are some structures which are accessed by CPUs concurrently.
17 * Here is the list of shared structures and operations on them which
18 * read/write the structures.
19 *
20 * - struct page_info
21 * This is a xen global resource. This structure is accessed by
22 * any CPUs.
23 *
24 * operations on this structure:
25 * - get_page() and its variant
26 * - put_page() and its variant
27 *
28 * - vTLB
29 * vcpu->arch.{d, i}tlb: Software tlb cache. These are per VCPU data.
30 * DEFINE_PER_CPU (unsigned long, vhpt_paddr): VHPT table per physical CPU.
31 *
32 * domain_flush_vtlb_range() and domain_flush_vtlb_all()
33 * write vcpu->arch.{d, i}tlb and VHPT table of vcpu which isn't current.
34 * So there are potential races to read/write VHPT and vcpu->arch.{d, i}tlb.
35 * Please note that reading VHPT is done by hardware page table walker.
36 *
37 * operations on this structure:
38 * - global tlb purge
39 * vcpu_ptc_g(), vcpu_ptc_ga() and domain_page_flush_and_put()
40 * I.e. callers of domain_flush_vtlb_range() and domain_flush_vtlb_all()
41 * These functions invalidate VHPT entry and vcpu->arch.{i, d}tlb
42 *
43 * - tlb insert and fc
44 * vcpu_itc_i()
45 * vcpu_itc_d()
46 * ia64_do_page_fault()
47 * vcpu_fc()
48 * These functions set VHPT entry and vcpu->arch.{i, d}tlb.
49 * Actually vcpu_itc_no_srlz() does.
50 *
51 * - the P2M table
52 * domain->mm and pgd, pud, pmd, pte table page.
53 * This structure is used to convert domain pseudo physical address
54 * to machine address. This is per domain resource.
55 *
56 * operations on this structure:
57 * - populate the P2M table tree
58 * lookup_alloc_domain_pte() and its variants.
59 * - set p2m entry
60 * assign_new_domain_page() and its variants.
61 * assign_domain_page() and its variants.
62 * - xchg p2m entry
63 * assign_domain_page_replace()
64 * - cmpxchg p2m entry
65 * assign_domain_page_cmpxchg_rel()
66 * replace_grant_host_mapping()
67 * steal_page()
68 * zap_domain_page_one()
69 * - read p2m entry
70 * lookup_alloc_domain_pte() and its variants.
71 *
72 * - the M2P table
73 * mpt_table (or machine_to_phys_mapping)
74 * This is a table which converts from machine address to pseudo physical
75 * address. This is a global structure.
76 *
77 * operations on this structure:
78 * - set m2p entry
79 * set_gpfn_from_mfn()
80 * - zap m2p entry
81 * set_gpfn_from_mfn(INVALID_P2M_ENTRY)
82 * - get m2p entry
83 * get_gpfn_from_mfn()
84 *
85 *
86 * * avoiding races
87 * The resources which are shared by CPUs must be accessed carefully
88 * to avoid race.
89 * IA64 has weak memory ordering so that attention must be paid
90 * to access shared structures. [SDM vol2 PartII chap. 2]
91 *
92 * - struct page_info memory ordering
93 * get_page() has acquire semantics.
94 * put_page() has release semantics.
95 *
96 * - populating the p2m table
97 * pgd, pud, pmd are append only.
98 *
99 * - races when updating the P2M tables and the M2P table
100 * The P2M entry are shared by more than one vcpu.
101 * So they are accessed atomic operations.
102 * I.e. xchg or cmpxchg must be used to update the p2m entry.
103 * NOTE: When creating/destructing a domain, we don't need to take care of
104 * this race.
105 *
106 * The M2P table is inverse of the P2M table.
107 * I.e. P2M(M2P(p)) = p and M2P(P2M(m)) = m
108 * The M2P table and P2M table must be updated consistently.
109 * Here is the update sequence
110 *
111 * xchg or cmpxchg case
112 * - set_gpfn_from_mfn(new_mfn, gpfn)
113 * - memory barrier
114 * - atomic update of the p2m entry (xchg or cmpxchg the p2m entry)
115 * get old_mfn entry as a result.
116 * - memory barrier
117 * - set_gpfn_from_mfn(old_mfn, INVALID_P2M_ENTRY)
118 *
119 * Here memory barrier can be achieved by release semantics.
120 *
121 * - races between global tlb purge and tlb insert
122 * This is a race between reading/writing vcpu->arch.{d, i}tlb or VHPT entry.
123 * When a vcpu is about to insert tlb, another vcpu may purge tlb
124 * cache globally. Inserting tlb (vcpu_itc_no_srlz()) or global tlb purge
125 * (domain_flush_vtlb_range() and domain_flush_vtlb_all()) can't update
126 * cpu->arch.{d, i}tlb, VHPT and mTLB. So there is a race here.
127 *
128 * Here check vcpu->arch.{d, i}tlb.p bit
129 * After inserting tlb entry, check the p bit and retry to insert.
130 * This means that when global tlb purge and tlb insert are issued
131 * simultaneously, always global tlb purge happens after tlb insert.
132 *
133 * - races between p2m entry update and tlb insert
134 * This is a race between reading/writing the p2m entry.
135 * reader: vcpu_itc_i(), vcpu_itc_d(), ia64_do_page_fault(), vcpu_fc()
136 * writer: assign_domain_page_cmpxchg_rel(), replace_grant_host_mapping(),
137 * steal_page(), zap_domain_page_one()
138 *
139 * For example, vcpu_itc_i() is about to insert tlb by calling
140 * vcpu_itc_no_srlz() after reading the p2m entry.
141 * At the same time, the p2m entry is replaced by xchg or cmpxchg and
142 * tlb cache of the page is flushed.
143 * There is a possibility that the p2m entry doesn't already point to the
144 * old page, but tlb cache still points to the old page.
145 * This can be detected similar to sequence lock using the p2m entry itself.
146 * reader remember the read value of the p2m entry, and insert tlb.
147 * Then read the p2m entry again. If the new p2m entry value is different
148 * from the used p2m entry value, the retry.
149 *
150 * - races between referencing page and p2m entry update
151 * This is a race between reading/writing the p2m entry.
152 * reader: vcpu_get_domain_bundle(), vmx_get_domain_bundle(),
153 * efi_emulate_get_time()
154 * writer: assign_domain_page_cmpxchg_rel(), replace_grant_host_mapping(),
155 * steal_page(), zap_domain_page_one()
156 *
157 * A page which assigned to a domain can be de-assigned by another vcpu.
158 * So before read/write to a domain page, the page's reference count
159 * must be incremented.
160 * vcpu_get_domain_bundle(), vmx_get_domain_bundle() and
161 * efi_emulate_get_time()
162 *
163 */
165 #include <xen/config.h>
166 #include <xen/sched.h>
167 #include <xen/domain.h>
168 #include <asm/xentypes.h>
169 #include <xen/mm.h>
170 #include <xen/errno.h>
171 #include <asm/pgalloc.h>
172 #include <asm/vhpt.h>
173 #include <asm/vcpu.h>
174 #include <asm/shadow.h>
175 #include <asm/p2m_entry.h>
176 #include <asm/tlb_track.h>
177 #include <linux/efi.h>
178 #include <linux/sort.h>
179 #include <xen/guest_access.h>
180 #include <asm/page.h>
181 #include <asm/dom_fw_common.h>
182 #include <public/memory.h>
183 #include <asm/event.h>
184 #include <asm/debugger.h>
186 static void domain_page_flush_and_put(struct domain* d, unsigned long mpaddr,
187 volatile pte_t* ptep, pte_t old_pte,
188 struct page_info* page);
190 extern unsigned long ia64_iobase;
192 struct domain *dom_xen, *dom_io;
194 /*
195 * This number is bigger than DOMID_SELF, DOMID_XEN and DOMID_IO.
196 * If more reserved domain ids are introduced, this might be increased.
197 */
198 #define DOMID_P2M (0x7FF8U)
199 static struct domain *dom_p2m;
201 // followings are stolen from arch_init_memory() @ xen/arch/x86/mm.c
202 void
203 alloc_dom_xen_and_dom_io(void)
204 {
205 /*
206 * Initialise our DOMID_XEN domain.
207 * Any Xen-heap pages that we will allow to be mapped will have
208 * their domain field set to dom_xen.
209 */
210 dom_xen = domain_create(DOMID_XEN, DOMCRF_dummy, 0);
211 BUG_ON(dom_xen == NULL);
213 /*
214 * Initialise our DOMID_IO domain.
215 * This domain owns I/O pages that are within the range of the page_info
216 * array. Mappings occur at the priv of the caller.
217 */
218 dom_io = domain_create(DOMID_IO, DOMCRF_dummy, 0);
219 BUG_ON(dom_io == NULL);
220 }
222 static int
223 mm_teardown_can_skip(struct domain* d, unsigned long offset)
224 {
225 return d->arch.mm_teardown_offset > offset;
226 }
228 static void
229 mm_teardown_update_offset(struct domain* d, unsigned long offset)
230 {
231 d->arch.mm_teardown_offset = offset;
232 }
234 static void
235 mm_teardown_pte(struct domain* d, volatile pte_t* pte, unsigned long offset)
236 {
237 pte_t old_pte;
238 unsigned long mfn;
239 struct page_info* page;
241 old_pte = ptep_get_and_clear(&d->arch.mm, offset, pte);// acquire semantics
243 // vmx domain use bit[58:56] to distinguish io region from memory.
244 // see vmx_build_physmap_table() in vmx_init.c
245 if (!pte_mem(old_pte))
246 return;
248 // domain might map IO space or acpi table pages. check it.
249 mfn = pte_pfn(old_pte);
250 if (!mfn_valid(mfn))
251 return;
252 page = mfn_to_page(mfn);
253 BUG_ON(page_get_owner(page) == NULL);
255 // struct page_info corresponding to mfn may exist or not depending
256 // on CONFIG_VIRTUAL_FRAME_TABLE.
257 // The above check is too easy.
258 // The right way is to check whether this page is of io area or acpi pages
260 if (pte_pgc_allocated(old_pte)) {
261 BUG_ON(page_get_owner(page) != d);
262 BUG_ON(get_gpfn_from_mfn(mfn) == INVALID_M2P_ENTRY);
263 set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
264 if (test_and_clear_bit(_PGC_allocated, &page->count_info))
265 put_page(page);
266 } else {
267 put_page(page);
268 }
269 }
271 static int
272 mm_teardown_pmd(struct domain* d, volatile pmd_t* pmd, unsigned long offset)
273 {
274 unsigned long i;
275 volatile pte_t* pte = pte_offset_map(pmd, offset);
277 for (i = 0; i < PTRS_PER_PTE; i++, pte++) {
278 unsigned long cur_offset = offset + (i << PAGE_SHIFT);
279 if (mm_teardown_can_skip(d, cur_offset + PAGE_SIZE))
280 continue;
281 if (!pte_present(*pte)) { // acquire semantics
282 mm_teardown_update_offset(d, cur_offset);
283 continue;
284 }
285 mm_teardown_update_offset(d, cur_offset);
286 mm_teardown_pte(d, pte, cur_offset);
287 if (hypercall_preempt_check())
288 return -EAGAIN;
289 }
290 return 0;
291 }
293 static int
294 mm_teardown_pud(struct domain* d, volatile pud_t *pud, unsigned long offset)
295 {
296 unsigned long i;
297 volatile pmd_t *pmd = pmd_offset(pud, offset);
299 for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
300 unsigned long cur_offset = offset + (i << PMD_SHIFT);
301 if (mm_teardown_can_skip(d, cur_offset + PMD_SIZE))
302 continue;
303 if (!pmd_present(*pmd)) { // acquire semantics
304 mm_teardown_update_offset(d, cur_offset);
305 continue;
306 }
307 if (mm_teardown_pmd(d, pmd, cur_offset))
308 return -EAGAIN;
309 }
310 return 0;
311 }
313 static int
314 mm_teardown_pgd(struct domain* d, volatile pgd_t *pgd, unsigned long offset)
315 {
316 unsigned long i;
317 volatile pud_t *pud = pud_offset(pgd, offset);
319 for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
320 unsigned long cur_offset = offset + (i << PUD_SHIFT);
321 #ifndef __PAGETABLE_PUD_FOLDED
322 if (mm_teardown_can_skip(d, cur_offset + PUD_SIZE))
323 continue;
324 #endif
325 if (!pud_present(*pud)) { // acquire semantics
326 #ifndef __PAGETABLE_PUD_FOLDED
327 mm_teardown_update_offset(d, cur_offset);
328 #endif
329 continue;
330 }
331 if (mm_teardown_pud(d, pud, cur_offset))
332 return -EAGAIN;
333 }
334 return 0;
335 }
337 int
338 mm_teardown(struct domain* d)
339 {
340 struct mm_struct* mm = &d->arch.mm;
341 unsigned long i;
342 volatile pgd_t* pgd;
344 if (mm->pgd == NULL)
345 return 0;
347 pgd = pgd_offset(mm, 0);
348 for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
349 unsigned long cur_offset = i << PGDIR_SHIFT;
351 if (mm_teardown_can_skip(d, cur_offset + PGDIR_SIZE))
352 continue;
353 if (!pgd_present(*pgd)) { // acquire semantics
354 mm_teardown_update_offset(d, cur_offset);
355 continue;
356 }
357 if (mm_teardown_pgd(d, pgd, cur_offset))
358 return -EAGAIN;
359 }
361 foreign_p2m_destroy(d);
362 return 0;
363 }
365 static void
366 mm_p2m_teardown_pmd(struct domain* d, volatile pmd_t* pmd,
367 unsigned long offset)
368 {
369 pte_free_kernel(pte_offset_map(pmd, offset));
370 }
372 static void
373 mm_p2m_teardown_pud(struct domain* d, volatile pud_t *pud,
374 unsigned long offset)
375 {
376 unsigned long i;
377 volatile pmd_t *pmd = pmd_offset(pud, offset);
379 for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
380 if (!pmd_present(*pmd))
381 continue;
382 mm_p2m_teardown_pmd(d, pmd, offset + (i << PMD_SHIFT));
383 }
384 pmd_free(pmd_offset(pud, offset));
385 }
387 static void
388 mm_p2m_teardown_pgd(struct domain* d, volatile pgd_t *pgd,
389 unsigned long offset)
390 {
391 unsigned long i;
392 volatile pud_t *pud = pud_offset(pgd, offset);
394 for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
395 if (!pud_present(*pud))
396 continue;
397 mm_p2m_teardown_pud(d, pud, offset + (i << PUD_SHIFT));
398 }
399 pud_free(pud_offset(pgd, offset));
400 }
402 static void
403 mm_p2m_teardown(struct domain* d)
404 {
405 struct mm_struct* mm = &d->arch.mm;
406 unsigned long i;
407 volatile pgd_t* pgd;
409 BUG_ON(mm->pgd == NULL);
410 pgd = pgd_offset(mm, 0);
411 for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
412 if (!pgd_present(*pgd))
413 continue;
414 mm_p2m_teardown_pgd(d, pgd, i << PGDIR_SHIFT);
415 }
416 pgd_free(mm->pgd);
417 mm->pgd = NULL;
418 }
420 void
421 mm_final_teardown(struct domain* d)
422 {
423 if (d->arch.shadow_bitmap != NULL) {
424 xfree(d->arch.shadow_bitmap);
425 d->arch.shadow_bitmap = NULL;
426 }
427 mm_p2m_teardown(d);
428 }
430 unsigned long
431 domain_get_maximum_gpfn(struct domain *d)
432 {
433 return (d->arch.convmem_end - 1) >> PAGE_SHIFT;
434 }
436 // stolen from share_xen_page_with_guest() in xen/arch/x86/mm.c
437 void
438 share_xen_page_with_guest(struct page_info *page,
439 struct domain *d, int readonly)
440 {
441 if ( page_get_owner(page) == d )
442 return;
444 #if 1
445 if (readonly) {
446 printk("%s:%d readonly is not supported yet\n", __func__, __LINE__);
447 }
448 #endif
450 // alloc_xenheap_pages() doesn't initialize page owner.
451 //BUG_ON(page_get_owner(page) != NULL);
453 spin_lock(&d->page_alloc_lock);
455 #ifndef __ia64__
456 /* The incremented type count pins as writable or read-only. */
457 page->u.inuse.type_info = (readonly ? PGT_none : PGT_writable_page);
458 page->u.inuse.type_info |= PGT_validated | 1;
459 #endif
461 page_set_owner(page, d);
462 wmb(); /* install valid domain ptr before updating refcnt. */
463 ASSERT(page->count_info == 0);
465 /* Only add to the allocation list if the domain isn't dying. */
466 if ( !d->is_dying )
467 {
468 page->count_info |= PGC_allocated | 1;
469 if ( unlikely(d->xenheap_pages++ == 0) )
470 get_knownalive_domain(d);
471 list_add_tail(&page->list, &d->xenpage_list);
472 }
474 // grant_table_destroy() releases these pages.
475 // but it doesn't clear their m2p entry. So there might remain stale
476 // entries. such a stale entry is cleared here.
477 set_gpfn_from_mfn(page_to_mfn(page), INVALID_M2P_ENTRY);
479 spin_unlock(&d->page_alloc_lock);
480 }
482 void
483 share_xen_page_with_privileged_guests(struct page_info *page, int readonly)
484 {
485 share_xen_page_with_guest(page, dom_xen, readonly);
486 }
488 unsigned long
489 gmfn_to_mfn_foreign(struct domain *d, unsigned long gpfn)
490 {
491 unsigned long pte;
493 pte = lookup_domain_mpa(d,gpfn << PAGE_SHIFT, NULL);
494 if (!pte) {
495 panic("gmfn_to_mfn_foreign: bad gpfn. spinning...\n");
496 }
498 if ((pte & _PAGE_IO) && is_hvm_domain(d))
499 return INVALID_MFN;
501 return ((pte & _PFN_MASK) >> PAGE_SHIFT);
502 }
504 // given a domain virtual address, pte and pagesize, extract the metaphysical
505 // address, convert the pte for a physical address for (possibly different)
506 // Xen PAGE_SIZE and return modified pte. (NOTE: TLB insert should use
507 // current->arch.vhpt_pg_shift!)
508 u64 translate_domain_pte(u64 pteval, u64 address, u64 itir__, u64* itir,
509 struct p2m_entry* entry)
510 {
511 struct domain *d = current->domain;
512 ia64_itir_t _itir = {.itir = itir__};
513 u64 mask, mpaddr, pteval2;
514 u64 arflags;
515 u64 arflags2;
516 u64 maflags2;
518 pteval &= ((1UL << 53) - 1);// ignore [63:53] bits
520 // FIXME address had better be pre-validated on insert
521 mask = ~itir_mask(_itir.itir);
522 mpaddr = ((pteval & _PAGE_PPN_MASK) & ~mask) | (address & mask);
524 if (_itir.ps > PAGE_SHIFT)
525 _itir.ps = PAGE_SHIFT;
527 ((ia64_itir_t*)itir)->itir = _itir.itir;/* Copy the whole register. */
528 ((ia64_itir_t*)itir)->ps = _itir.ps; /* Overwrite ps part! */
530 pteval2 = lookup_domain_mpa(d, mpaddr, entry);
531 if (_itir.ps < PAGE_SHIFT)
532 pteval2 |= mpaddr & ~PAGE_MASK & ~((1L << _itir.ps) - 1);
534 /* Check access rights. */
535 arflags = pteval & _PAGE_AR_MASK;
536 arflags2 = pteval2 & _PAGE_AR_MASK;
537 if (arflags != _PAGE_AR_R && arflags2 == _PAGE_AR_R) {
538 #if 0
539 dprintk(XENLOG_WARNING,
540 "%s:%d "
541 "pteval 0x%lx arflag 0x%lx address 0x%lx itir 0x%lx "
542 "pteval2 0x%lx arflags2 0x%lx mpaddr 0x%lx\n",
543 __func__, __LINE__,
544 pteval, arflags, address, itir__,
545 pteval2, arflags2, mpaddr);
546 #endif
547 pteval = (pteval & ~_PAGE_AR_MASK) | _PAGE_AR_R;
548 }
550 /* Check memory attribute. The switch is on the *requested* memory
551 attribute. */
552 maflags2 = pteval2 & _PAGE_MA_MASK;
553 switch (pteval & _PAGE_MA_MASK) {
554 case _PAGE_MA_NAT:
555 /* NaT pages are always accepted! */
556 break;
557 case _PAGE_MA_UC:
558 case _PAGE_MA_UCE:
559 case _PAGE_MA_WC:
560 if (maflags2 == _PAGE_MA_WB) {
561 /* Don't let domains WB-map uncached addresses.
562 This can happen when domU tries to touch i/o
563 port space. Also prevents possible address
564 aliasing issues. */
565 if (!(mpaddr - IO_PORTS_PADDR < IO_PORTS_SIZE)) {
566 u64 ucwb;
568 /*
569 * If dom0 page has both UC & WB attributes
570 * don't warn about attempted UC access.
571 */
572 ucwb = efi_mem_attribute(mpaddr, PAGE_SIZE);
573 ucwb &= EFI_MEMORY_UC | EFI_MEMORY_WB;
574 ucwb ^= EFI_MEMORY_UC | EFI_MEMORY_WB;
576 if (d != dom0 || ucwb != 0)
577 gdprintk(XENLOG_WARNING, "Warning: UC"
578 " to WB for mpaddr=%lx\n",
579 mpaddr);
580 }
581 pteval = (pteval & ~_PAGE_MA_MASK) | _PAGE_MA_WB;
582 }
583 break;
584 case _PAGE_MA_WB:
585 if (maflags2 != _PAGE_MA_WB) {
586 /* Forbid non-coherent access to coherent memory. */
587 panic_domain(NULL, "try to use WB mem attr on "
588 "UC page, mpaddr=%lx\n", mpaddr);
589 }
590 break;
591 default:
592 panic_domain(NULL, "try to use unknown mem attribute\n");
593 }
595 /* If shadow mode is enabled, virtualize dirty bit. */
596 if (shadow_mode_enabled(d) && (pteval & _PAGE_D)) {
597 u64 mp_page = mpaddr >> PAGE_SHIFT;
598 pteval |= _PAGE_VIRT_D;
600 /* If the page is not already dirty, don't set the dirty bit! */
601 if (mp_page < d->arch.shadow_bitmap_size * 8
602 && !test_bit(mp_page, d->arch.shadow_bitmap))
603 pteval &= ~_PAGE_D;
604 }
606 /* Ignore non-addr bits of pteval2 and force PL0->1
607 (PL3 is unaffected) */
608 return (pteval & ~(_PAGE_PPN_MASK | _PAGE_PL_MASK)) |
609 (pteval2 & _PAGE_PPN_MASK) |
610 (vcpu_pl_adjust(pteval, 7) & _PAGE_PL_MASK);
611 }
613 // given a current domain metaphysical address, return the physical address
614 unsigned long translate_domain_mpaddr(unsigned long mpaddr,
615 struct p2m_entry* entry)
616 {
617 unsigned long pteval;
619 pteval = lookup_domain_mpa(current->domain, mpaddr, entry);
620 return ((pteval & _PAGE_PPN_MASK) | (mpaddr & ~PAGE_MASK));
621 }
623 //XXX !xxx_present() should be used instread of !xxx_none()?
624 // pud, pmd, pte page is zero cleared when they are allocated.
625 // Their area must be visible before population so that
626 // cmpxchg must have release semantics.
627 static volatile pte_t*
628 lookup_alloc_domain_pte(struct domain* d, unsigned long mpaddr)
629 {
630 struct mm_struct *mm = &d->arch.mm;
631 volatile pgd_t *pgd;
632 volatile pud_t *pud;
633 volatile pmd_t *pmd;
635 BUG_ON(mm->pgd == NULL);
637 pgd = pgd_offset(mm, mpaddr);
638 again_pgd:
639 if (unlikely(pgd_none(*pgd))) { // acquire semantics
640 pud_t *old_pud = NULL;
641 pud = pud_alloc_one(mm, mpaddr);
642 if (unlikely(!pgd_cmpxchg_rel(mm, pgd, old_pud, pud))) {
643 pud_free(pud);
644 goto again_pgd;
645 }
646 }
648 pud = pud_offset(pgd, mpaddr);
649 again_pud:
650 if (unlikely(pud_none(*pud))) { // acquire semantics
651 pmd_t* old_pmd = NULL;
652 pmd = pmd_alloc_one(mm, mpaddr);
653 if (unlikely(!pud_cmpxchg_rel(mm, pud, old_pmd, pmd))) {
654 pmd_free(pmd);
655 goto again_pud;
656 }
657 }
659 pmd = pmd_offset(pud, mpaddr);
660 again_pmd:
661 if (unlikely(pmd_none(*pmd))) { // acquire semantics
662 pte_t* old_pte = NULL;
663 pte_t* pte = pte_alloc_one_kernel(mm, mpaddr);
664 if (unlikely(!pmd_cmpxchg_kernel_rel(mm, pmd, old_pte, pte))) {
665 pte_free_kernel(pte);
666 goto again_pmd;
667 }
668 }
670 return pte_offset_map(pmd, mpaddr);
671 }
673 //XXX xxx_none() should be used instread of !xxx_present()?
674 volatile pte_t*
675 lookup_noalloc_domain_pte(struct domain* d, unsigned long mpaddr)
676 {
677 struct mm_struct *mm = &d->arch.mm;
678 volatile pgd_t *pgd;
679 volatile pud_t *pud;
680 volatile pmd_t *pmd;
682 BUG_ON(mm->pgd == NULL);
683 pgd = pgd_offset(mm, mpaddr);
684 if (unlikely(!pgd_present(*pgd))) // acquire semantics
685 return NULL;
687 pud = pud_offset(pgd, mpaddr);
688 if (unlikely(!pud_present(*pud))) // acquire semantics
689 return NULL;
691 pmd = pmd_offset(pud, mpaddr);
692 if (unlikely(!pmd_present(*pmd))) // acquire semantics
693 return NULL;
695 return pte_offset_map(pmd, mpaddr);
696 }
698 static volatile pte_t*
699 lookup_noalloc_domain_pte_none(struct domain* d, unsigned long mpaddr)
700 {
701 struct mm_struct *mm = &d->arch.mm;
702 volatile pgd_t *pgd;
703 volatile pud_t *pud;
704 volatile pmd_t *pmd;
706 BUG_ON(mm->pgd == NULL);
707 pgd = pgd_offset(mm, mpaddr);
708 if (unlikely(pgd_none(*pgd))) // acquire semantics
709 return NULL;
711 pud = pud_offset(pgd, mpaddr);
712 if (unlikely(pud_none(*pud))) // acquire semantics
713 return NULL;
715 pmd = pmd_offset(pud, mpaddr);
716 if (unlikely(pmd_none(*pmd))) // acquire semantics
717 return NULL;
719 return pte_offset_map(pmd, mpaddr);
720 }
722 unsigned long
723 ____lookup_domain_mpa(struct domain *d, unsigned long mpaddr)
724 {
725 volatile pte_t *pte;
727 pte = lookup_noalloc_domain_pte(d, mpaddr);
728 if (pte == NULL)
729 return INVALID_MFN;
731 if (pte_present(*pte))
732 return (pte->pte & _PFN_MASK);
733 return INVALID_MFN;
734 }
736 unsigned long lookup_domain_mpa(struct domain *d, unsigned long mpaddr,
737 struct p2m_entry* entry)
738 {
739 volatile pte_t *pte = lookup_noalloc_domain_pte(d, mpaddr);
741 if (pte != NULL) {
742 pte_t tmp_pte = *pte;// pte is volatile. copy the value.
743 if (pte_present(tmp_pte)) {
744 if (entry != NULL)
745 p2m_entry_set(entry, pte, tmp_pte);
746 return pte_val(tmp_pte);
747 } else if (is_hvm_domain(d))
748 return INVALID_MFN;
749 }
751 if (mpaddr < d->arch.convmem_end && !d->is_dying) {
752 gdprintk(XENLOG_WARNING, "vcpu %d iip 0x%016lx: non-allocated mpa "
753 "d %"PRId16" 0x%lx (< 0x%lx)\n",
754 current->vcpu_id, PSCB(current, iip),
755 d->domain_id, mpaddr, d->arch.convmem_end);
756 } else if (mpaddr - IO_PORTS_PADDR < IO_PORTS_SIZE) {
757 /* Log I/O port probing, but complain less loudly about it */
758 gdprintk(XENLOG_INFO, "vcpu %d iip 0x%016lx: bad I/O port access "
759 "d %"PRId16" 0x%lx\n",
760 current->vcpu_id, PSCB(current, iip), d->domain_id,
761 IO_SPACE_SPARSE_DECODING(mpaddr - IO_PORTS_PADDR));
762 } else {
763 gdprintk(XENLOG_WARNING, "vcpu %d iip 0x%016lx: bad mpa "
764 "d %"PRId16" 0x%lx (=> 0x%lx)\n",
765 current->vcpu_id, PSCB(current, iip),
766 d->domain_id, mpaddr, d->arch.convmem_end);
767 }
769 debugger_event (XEN_IA64_DEBUG_ON_BAD_MPA);
771 if (entry != NULL)
772 p2m_entry_set(entry, NULL, __pte(0));
773 //XXX This is a work around until the emulation memory access to a region
774 // where memory or device are attached is implemented.
775 return pte_val(pfn_pte(0, __pgprot(__DIRTY_BITS | _PAGE_PL_PRIV |
776 _PAGE_AR_RWX)));
777 }
779 // FIXME: ONLY USE FOR DOMAIN PAGE_SIZE == PAGE_SIZE
780 #if 1
781 void *domain_mpa_to_imva(struct domain *d, unsigned long mpaddr)
782 {
783 unsigned long pte = lookup_domain_mpa(d, mpaddr, NULL);
784 unsigned long imva;
786 pte &= _PAGE_PPN_MASK;
787 imva = (unsigned long) __va(pte);
788 imva |= mpaddr & ~PAGE_MASK;
789 return (void*)imva;
790 }
791 #else
792 void *domain_mpa_to_imva(struct domain *d, unsigned long mpaddr)
793 {
794 unsigned long imva = __gpa_to_mpa(d, mpaddr);
796 return (void *)__va(imva);
797 }
798 #endif
800 unsigned long
801 paddr_to_maddr(unsigned long paddr)
802 {
803 struct vcpu *v = current;
804 struct domain *d = v->domain;
805 u64 pa;
807 pa = ____lookup_domain_mpa(d, paddr);
808 if (pa == INVALID_MFN) {
809 printk("%s: called with bad memory address: 0x%lx - iip=%lx\n",
810 __func__, paddr, vcpu_regs(v)->cr_iip);
811 return 0;
812 }
813 return (pa & _PFN_MASK) | (paddr & ~PAGE_MASK);
814 }
816 /* Allocate a new page for domain and map it to the specified metaphysical
817 address. */
818 static struct page_info *
819 __assign_new_domain_page(struct domain *d, unsigned long mpaddr,
820 volatile pte_t* pte)
821 {
822 struct page_info *p;
823 unsigned long maddr;
825 BUG_ON(!pte_none(*pte));
827 p = alloc_domheap_page(d, 0);
828 if (unlikely(!p)) {
829 printk("assign_new_domain_page: Can't alloc!!!! Aaaargh!\n");
830 return(p);
831 }
833 // zero out pages for security reasons
834 clear_page(page_to_virt(p));
835 maddr = page_to_maddr (p);
836 if (unlikely(maddr > __get_cpu_var(vhpt_paddr)
837 && maddr < __get_cpu_var(vhpt_pend))) {
838 /* FIXME: how can this happen ?
839 vhpt is allocated by alloc_domheap_page. */
840 printk("assign_new_domain_page: reassigned vhpt page %lx!!\n",
841 maddr);
842 }
844 set_gpfn_from_mfn(page_to_mfn(p), mpaddr >> PAGE_SHIFT);
845 // clear_page() and set_gpfn_from_mfn() become visible before set_pte_rel()
846 // because set_pte_rel() has release semantics
847 set_pte_rel(pte,
848 pfn_pte(maddr >> PAGE_SHIFT,
849 __pgprot(_PAGE_PGC_ALLOCATED | __DIRTY_BITS |
850 _PAGE_PL_PRIV | _PAGE_AR_RWX)));
852 smp_mb();
853 return p;
854 }
856 struct page_info *
857 assign_new_domain_page(struct domain *d, unsigned long mpaddr)
858 {
859 volatile pte_t *pte = lookup_alloc_domain_pte(d, mpaddr);
861 if (!pte_none(*pte))
862 return NULL;
864 return __assign_new_domain_page(d, mpaddr, pte);
865 }
867 void __init
868 assign_new_domain0_page(struct domain *d, unsigned long mpaddr)
869 {
870 volatile pte_t *pte;
872 BUG_ON(d != dom0);
873 pte = lookup_alloc_domain_pte(d, mpaddr);
874 if (pte_none(*pte)) {
875 struct page_info *p = __assign_new_domain_page(d, mpaddr, pte);
876 if (p == NULL) {
877 panic("%s: can't allocate page for dom0\n", __func__);
878 }
879 }
880 }
882 static unsigned long
883 flags_to_prot (unsigned long flags)
884 {
885 unsigned long res = _PAGE_PL_PRIV | __DIRTY_BITS;
887 res |= flags & ASSIGN_readonly ? _PAGE_AR_R: _PAGE_AR_RWX;
888 res |= flags & ASSIGN_nocache ? _PAGE_MA_UC: _PAGE_MA_WB;
889 #ifdef CONFIG_XEN_IA64_TLB_TRACK
890 res |= flags & ASSIGN_tlb_track ? _PAGE_TLB_TRACKING: 0;
891 #endif
892 res |= flags & ASSIGN_pgc_allocated ? _PAGE_PGC_ALLOCATED: 0;
893 res |= flags & ASSIGN_io ? _PAGE_IO: 0;
895 return res;
896 }
898 /* map a physical address to the specified metaphysical addr */
899 // flags: currently only ASSIGN_readonly, ASSIGN_nocache, ASSIGN_tlb_tack
900 // This is called by assign_domain_mmio_page().
901 // So accessing to pte is racy.
902 int
903 __assign_domain_page(struct domain *d,
904 unsigned long mpaddr, unsigned long physaddr,
905 unsigned long flags)
906 {
907 volatile pte_t *pte;
908 pte_t old_pte;
909 pte_t new_pte;
910 pte_t ret_pte;
911 unsigned long prot = flags_to_prot(flags);
913 pte = lookup_alloc_domain_pte(d, mpaddr);
915 old_pte = __pte(0);
916 new_pte = pfn_pte(physaddr >> PAGE_SHIFT, __pgprot(prot));
917 ret_pte = ptep_cmpxchg_rel(&d->arch.mm, mpaddr, pte, old_pte, new_pte);
918 if (pte_val(ret_pte) == pte_val(old_pte)) {
919 smp_mb();
920 return 0;
921 }
923 // dom0 tries to map real machine's I/O region, but failed.
924 // It is very likely that dom0 doesn't boot correctly because
925 // it can't access I/O. So complain here.
926 if (flags & ASSIGN_nocache) {
927 int warn = 0;
929 if (pte_pfn(ret_pte) != (physaddr >> PAGE_SHIFT))
930 warn = 1;
931 else if (!(pte_val(ret_pte) & _PAGE_MA_UC)) {
932 u32 type;
933 u64 attr;
935 warn = 1;
937 /*
938 * See
939 * complete_dom0_memmap()
940 * case EFI_RUNTIME_SERVICES_CODE:
941 * case EFI_RUNTIME_SERVICES_DATA:
942 * case EFI_ACPI_RECLAIM_MEMORY:
943 * case EFI_ACPI_MEMORY_NVS:
944 * case EFI_RESERVED_TYPE:
945 *
946 * Currently only EFI_RUNTIME_SERVICES_CODE is found
947 * so that we suppress only EFI_RUNTIME_SERVICES_CODE case.
948 */
949 type = efi_mem_type(physaddr);
950 attr = efi_mem_attributes(physaddr);
951 if (type == EFI_RUNTIME_SERVICES_CODE &&
952 (attr & EFI_MEMORY_UC) && (attr & EFI_MEMORY_WB))
953 warn = 0;
954 }
955 if (warn)
956 printk("%s:%d WARNING can't assign page domain 0x%p id %d\n"
957 "\talready assigned pte_val 0x%016lx\n"
958 "\tmpaddr 0x%016lx physaddr 0x%016lx flags 0x%lx\n",
959 __func__, __LINE__,
960 d, d->domain_id, pte_val(ret_pte),
961 mpaddr, physaddr, flags);
962 }
964 return -EAGAIN;
965 }
967 /* get_page() and map a physical address to the specified metaphysical addr */
968 void
969 assign_domain_page(struct domain *d,
970 unsigned long mpaddr, unsigned long physaddr)
971 {
972 struct page_info* page = mfn_to_page(physaddr >> PAGE_SHIFT);
974 BUG_ON((physaddr & _PAGE_PPN_MASK) != physaddr);
975 BUG_ON(page->count_info != (PGC_allocated | 1));
976 set_gpfn_from_mfn(physaddr >> PAGE_SHIFT, mpaddr >> PAGE_SHIFT);
977 // because __assign_domain_page() uses set_pte_rel() which has
978 // release semantics, smp_mb() isn't needed.
979 (void)__assign_domain_page(d, mpaddr, physaddr,
980 ASSIGN_writable | ASSIGN_pgc_allocated);
981 }
983 int
984 ioports_permit_access(struct domain *d, unsigned int fp, unsigned int lp)
985 {
986 struct io_space *space;
987 unsigned long mmio_start, mmio_end, mach_start;
988 int ret;
990 if (IO_SPACE_NR(fp) >= num_io_spaces) {
991 dprintk(XENLOG_WARNING, "Unknown I/O Port range 0x%x - 0x%x\n", fp, lp);
992 return -EFAULT;
993 }
995 /*
996 * The ioport_cap rangeset tracks the I/O port address including
997 * the port space ID. This means port space IDs need to match
998 * between Xen and dom0. This is also a requirement because
999 * the hypercall to pass these port ranges only uses a u32.
1001 * NB - non-dom0 driver domains may only have a subset of the
1002 * I/O port spaces and thus will number port spaces differently.
1003 * This is ok, they don't make use of this interface.
1004 */
1005 ret = rangeset_add_range(d->arch.ioport_caps, fp, lp);
1006 if (ret != 0)
1007 return ret;
1009 space = &io_space[IO_SPACE_NR(fp)];
1011 /* Legacy I/O on dom0 is already setup */
1012 if (d == dom0 && space == &io_space[0])
1013 return 0;
1015 fp = IO_SPACE_PORT(fp);
1016 lp = IO_SPACE_PORT(lp);
1018 if (space->sparse) {
1019 mmio_start = IO_SPACE_SPARSE_ENCODING(fp) & ~PAGE_MASK;
1020 mmio_end = PAGE_ALIGN(IO_SPACE_SPARSE_ENCODING(lp));
1021 } else {
1022 mmio_start = fp & ~PAGE_MASK;
1023 mmio_end = PAGE_ALIGN(lp);
1026 /*
1027 * The "machine first port" is not necessarily identity mapped
1028 * to the guest first port. At least for the legacy range.
1029 */
1030 mach_start = mmio_start | __pa(space->mmio_base);
1032 if (space == &io_space[0]) {
1033 mmio_start |= IO_PORTS_PADDR;
1034 mmio_end |= IO_PORTS_PADDR;
1035 } else {
1036 mmio_start |= __pa(space->mmio_base);
1037 mmio_end |= __pa(space->mmio_base);
1040 while (mmio_start <= mmio_end) {
1041 (void)__assign_domain_page(d, mmio_start, mach_start, ASSIGN_nocache);
1042 mmio_start += PAGE_SIZE;
1043 mach_start += PAGE_SIZE;
1046 return 0;
1049 static int
1050 ioports_has_allowed(struct domain *d, unsigned int fp, unsigned int lp)
1052 for (; fp < lp; fp++)
1053 if (rangeset_contains_singleton(d->arch.ioport_caps, fp))
1054 return 1;
1056 return 0;
1059 int
1060 ioports_deny_access(struct domain *d, unsigned int fp, unsigned int lp)
1062 int ret;
1063 struct mm_struct *mm = &d->arch.mm;
1064 unsigned long mmio_start, mmio_end, mmio_base;
1065 unsigned int fp_base, lp_base;
1066 struct io_space *space;
1068 if (IO_SPACE_NR(fp) >= num_io_spaces) {
1069 dprintk(XENLOG_WARNING, "Unknown I/O Port range 0x%x - 0x%x\n", fp, lp);
1070 return -EFAULT;
1073 ret = rangeset_remove_range(d->arch.ioport_caps, fp, lp);
1074 if (ret != 0)
1075 return ret;
1077 space = &io_space[IO_SPACE_NR(fp)];
1078 fp_base = IO_SPACE_PORT(fp);
1079 lp_base = IO_SPACE_PORT(lp);
1081 if (space->sparse) {
1082 mmio_start = IO_SPACE_SPARSE_ENCODING(fp_base) & ~PAGE_MASK;
1083 mmio_end = PAGE_ALIGN(IO_SPACE_SPARSE_ENCODING(lp_base));
1084 } else {
1085 mmio_start = fp_base & ~PAGE_MASK;
1086 mmio_end = PAGE_ALIGN(lp_base);
1089 if (space == &io_space[0] && d != dom0)
1090 mmio_base = IO_PORTS_PADDR;
1091 else
1092 mmio_base = __pa(space->mmio_base);
1094 for (; mmio_start < mmio_end; mmio_start += PAGE_SIZE) {
1095 unsigned int port, range;
1096 unsigned long mpaddr;
1097 volatile pte_t *pte;
1098 pte_t old_pte;
1100 if (space->sparse) {
1101 port = IO_SPACE_SPARSE_DECODING(mmio_start);
1102 range = IO_SPACE_SPARSE_PORTS_PER_PAGE - 1;
1103 } else {
1104 port = mmio_start;
1105 range = PAGE_SIZE - 1;
1108 port |= IO_SPACE_BASE(IO_SPACE_NR(fp));
1110 if (port < fp || port + range > lp) {
1111 /* Maybe this covers an allowed port. */
1112 if (ioports_has_allowed(d, port, port + range))
1113 continue;
1116 mpaddr = mmio_start | mmio_base;
1117 pte = lookup_noalloc_domain_pte_none(d, mpaddr);
1118 BUG_ON(pte == NULL);
1119 BUG_ON(pte_none(*pte));
1121 /* clear pte */
1122 old_pte = ptep_get_and_clear(mm, mpaddr, pte);
1124 domain_flush_vtlb_all(d);
1125 return 0;
1128 static void
1129 assign_domain_same_page(struct domain *d,
1130 unsigned long mpaddr, unsigned long size,
1131 unsigned long flags)
1133 //XXX optimization
1134 unsigned long end = PAGE_ALIGN(mpaddr + size);
1135 for (mpaddr &= PAGE_MASK; mpaddr < end; mpaddr += PAGE_SIZE) {
1136 (void)__assign_domain_page(d, mpaddr, mpaddr, flags);
1140 int
1141 efi_mmio(unsigned long physaddr, unsigned long size)
1143 void *efi_map_start, *efi_map_end;
1144 u64 efi_desc_size;
1145 void* p;
1147 efi_map_start = __va(ia64_boot_param->efi_memmap);
1148 efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
1149 efi_desc_size = ia64_boot_param->efi_memdesc_size;
1151 for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) {
1152 efi_memory_desc_t* md = (efi_memory_desc_t *)p;
1153 unsigned long start = md->phys_addr;
1154 unsigned long end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT);
1156 if (start <= physaddr && physaddr < end) {
1157 if ((physaddr + size) > end) {
1158 gdprintk(XENLOG_INFO, "%s: physaddr 0x%lx size = 0x%lx\n",
1159 __func__, physaddr, size);
1160 return 0;
1163 // for io space
1164 if (md->type == EFI_MEMORY_MAPPED_IO ||
1165 md->type == EFI_MEMORY_MAPPED_IO_PORT_SPACE) {
1166 return 1;
1169 // for runtime
1170 // see efi_enter_virtual_mode(void)
1171 // in linux/arch/ia64/kernel/efi.c
1172 if ((md->attribute & EFI_MEMORY_RUNTIME) &&
1173 !(md->attribute & EFI_MEMORY_WB)) {
1174 return 1;
1177 return 0;
1180 if (physaddr < start) {
1181 break;
1185 return 1;
1188 unsigned long
1189 assign_domain_mmio_page(struct domain *d, unsigned long mpaddr,
1190 unsigned long phys_addr, unsigned long size,
1191 unsigned long flags)
1193 unsigned long addr = mpaddr & PAGE_MASK;
1194 unsigned long end = PAGE_ALIGN(mpaddr + size);
1196 if (size == 0) {
1197 gdprintk(XENLOG_INFO, "%s: domain %p mpaddr 0x%lx size = 0x%lx\n",
1198 __func__, d, mpaddr, size);
1200 if (!efi_mmio(phys_addr, size)) {
1201 #ifndef NDEBUG
1202 gdprintk(XENLOG_INFO, "%s: domain %p mpaddr 0x%lx size = 0x%lx\n",
1203 __func__, d, mpaddr, size);
1204 #endif
1205 return -EINVAL;
1208 for (phys_addr &= PAGE_MASK; addr < end;
1209 addr += PAGE_SIZE, phys_addr += PAGE_SIZE) {
1210 __assign_domain_page(d, addr, phys_addr, flags);
1213 return mpaddr;
1216 unsigned long
1217 assign_domain_mach_page(struct domain *d,
1218 unsigned long mpaddr, unsigned long size,
1219 unsigned long flags)
1221 BUG_ON(flags & ASSIGN_pgc_allocated);
1222 assign_domain_same_page(d, mpaddr, size, flags);
1223 return mpaddr;
1226 static void
1227 adjust_page_count_info(struct page_info* page)
1229 struct domain* d = page_get_owner(page);
1230 BUG_ON((page->count_info & PGC_count_mask) != 1);
1231 if (d != NULL) {
1232 int ret = get_page(page, d);
1233 BUG_ON(ret == 0);
1234 } else {
1235 u64 x, nx, y;
1237 y = *((u64*)&page->count_info);
1238 do {
1239 x = y;
1240 nx = x + 1;
1242 BUG_ON((x >> 32) != 0);
1243 BUG_ON((nx & PGC_count_mask) != 2);
1244 y = cmpxchg((u64*)&page->count_info, x, nx);
1245 } while (unlikely(y != x));
1249 static void
1250 domain_put_page(struct domain* d, unsigned long mpaddr,
1251 volatile pte_t* ptep, pte_t old_pte, int clear_PGC_allocate)
1253 unsigned long mfn = pte_pfn(old_pte);
1254 struct page_info* page = mfn_to_page(mfn);
1256 if (pte_pgc_allocated(old_pte)) {
1257 if (page_get_owner(page) == d || page_get_owner(page) == NULL) {
1258 BUG_ON(get_gpfn_from_mfn(mfn) != (mpaddr >> PAGE_SHIFT));
1259 set_gpfn_from_mfn(mfn, INVALID_M2P_ENTRY);
1260 } else {
1261 BUG();
1264 if (likely(clear_PGC_allocate)) {
1265 if (!test_and_clear_bit(_PGC_allocated, &page->count_info))
1266 BUG();
1267 /* put_page() is done by domain_page_flush_and_put() */
1268 } else {
1269 // In this case, page reference count mustn't touched.
1270 // domain_page_flush_and_put() decrements it, we increment
1271 // it in advence. This patch is slow path.
1272 //
1273 // guest_remove_page(): owner = d, count_info = 1
1274 // memory_exchange(): owner = NULL, count_info = 1
1275 adjust_page_count_info(page);
1278 domain_page_flush_and_put(d, mpaddr, ptep, old_pte, page);
1281 // caller must get_page(mfn_to_page(mfn)) before call.
1282 // caller must call set_gpfn_from_mfn() before call if necessary.
1283 // because set_gpfn_from_mfn() result must be visible before pte xchg
1284 // caller must use memory barrier. NOTE: xchg has acquire semantics.
1285 // flags: ASSIGN_xxx
1286 static void
1287 assign_domain_page_replace(struct domain *d, unsigned long mpaddr,
1288 unsigned long mfn, unsigned long flags)
1290 struct mm_struct *mm = &d->arch.mm;
1291 volatile pte_t* pte;
1292 pte_t old_pte;
1293 pte_t npte;
1294 unsigned long prot = flags_to_prot(flags);
1296 pte = lookup_alloc_domain_pte(d, mpaddr);
1298 // update pte
1299 npte = pfn_pte(mfn, __pgprot(prot));
1300 old_pte = ptep_xchg(mm, mpaddr, pte, npte);
1301 if (pte_mem(old_pte)) {
1302 unsigned long old_mfn = pte_pfn(old_pte);
1304 // mfn = old_mfn case can happen when domain maps a granted page
1305 // twice with the same pseudo physial address.
1306 // It's non sense, but allowed.
1307 // __gnttab_map_grant_ref()
1308 // => create_host_mapping()
1309 // => assign_domain_page_replace()
1310 if (mfn != old_mfn) {
1311 domain_put_page(d, mpaddr, pte, old_pte, 1);
1314 perfc_incr(assign_domain_page_replace);
1317 // caller must get_page(new_page) before
1318 // Only steal_page() calls this function.
1319 static int
1320 assign_domain_page_cmpxchg_rel(struct domain* d, unsigned long mpaddr,
1321 struct page_info* old_page,
1322 struct page_info* new_page,
1323 unsigned long flags, int clear_PGC_allocate)
1325 struct mm_struct *mm = &d->arch.mm;
1326 volatile pte_t* pte;
1327 unsigned long old_mfn;
1328 unsigned long old_prot;
1329 pte_t old_pte;
1330 unsigned long new_mfn;
1331 unsigned long new_prot;
1332 pte_t new_pte;
1333 pte_t ret_pte;
1335 BUG_ON((flags & ASSIGN_pgc_allocated) == 0);
1336 pte = lookup_alloc_domain_pte(d, mpaddr);
1338 again:
1339 old_prot = pte_val(*pte) & ~_PAGE_PPN_MASK;
1340 old_mfn = page_to_mfn(old_page);
1341 old_pte = pfn_pte(old_mfn, __pgprot(old_prot));
1342 if (!pte_present(old_pte)) {
1343 gdprintk(XENLOG_INFO,
1344 "%s: old_pte 0x%lx old_prot 0x%lx old_mfn 0x%lx\n",
1345 __func__, pte_val(old_pte), old_prot, old_mfn);
1346 return -EINVAL;
1349 new_prot = flags_to_prot(flags);
1350 new_mfn = page_to_mfn(new_page);
1351 new_pte = pfn_pte(new_mfn, __pgprot(new_prot));
1353 // update pte
1354 ret_pte = ptep_cmpxchg_rel(mm, mpaddr, pte, old_pte, new_pte);
1355 if (unlikely(pte_val(old_pte) != pte_val(ret_pte))) {
1356 if (pte_pfn(old_pte) == pte_pfn(ret_pte)) {
1357 goto again;
1360 gdprintk(XENLOG_INFO,
1361 "%s: old_pte 0x%lx old_prot 0x%lx old_mfn 0x%lx "
1362 "ret_pte 0x%lx ret_mfn 0x%lx\n",
1363 __func__,
1364 pte_val(old_pte), old_prot, old_mfn,
1365 pte_val(ret_pte), pte_pfn(ret_pte));
1366 return -EINVAL;
1369 BUG_ON(!pte_mem(old_pte));
1370 BUG_ON(!pte_pgc_allocated(old_pte));
1371 BUG_ON(page_get_owner(old_page) != d);
1372 BUG_ON(get_gpfn_from_mfn(old_mfn) != (mpaddr >> PAGE_SHIFT));
1373 BUG_ON(old_mfn == new_mfn);
1375 set_gpfn_from_mfn(old_mfn, INVALID_M2P_ENTRY);
1376 if (likely(clear_PGC_allocate)) {
1377 if (!test_and_clear_bit(_PGC_allocated, &old_page->count_info))
1378 BUG();
1379 } else {
1380 int ret;
1381 // adjust for count_info for domain_page_flush_and_put()
1382 // This is slow path.
1383 BUG_ON(!test_bit(_PGC_allocated, &old_page->count_info));
1384 BUG_ON(d == NULL);
1385 ret = get_page(old_page, d);
1386 BUG_ON(ret == 0);
1389 domain_page_flush_and_put(d, mpaddr, pte, old_pte, old_page);
1390 perfc_incr(assign_domain_pge_cmpxchg_rel);
1391 return 0;
1394 static void
1395 zap_domain_page_one(struct domain *d, unsigned long mpaddr,
1396 int clear_PGC_allocate, unsigned long mfn)
1398 struct mm_struct *mm = &d->arch.mm;
1399 volatile pte_t *pte;
1400 pte_t old_pte;
1401 struct page_info *page;
1403 pte = lookup_noalloc_domain_pte_none(d, mpaddr);
1404 if (pte == NULL)
1405 return;
1406 if (pte_none(*pte))
1407 return;
1409 if (mfn == INVALID_MFN) {
1410 // clear pte
1411 old_pte = ptep_get_and_clear(mm, mpaddr, pte);
1412 mfn = pte_pfn(old_pte);
1413 } else {
1414 unsigned long old_arflags;
1415 pte_t new_pte;
1416 pte_t ret_pte;
1418 again:
1419 // memory_exchange() calls guest_physmap_remove_page() with
1420 // a stealed page. i.e. page owner = NULL.
1421 BUG_ON(page_get_owner(mfn_to_page(mfn)) != d &&
1422 page_get_owner(mfn_to_page(mfn)) != NULL);
1423 old_arflags = pte_val(*pte) & ~_PAGE_PPN_MASK;
1424 old_pte = pfn_pte(mfn, __pgprot(old_arflags));
1425 new_pte = __pte(0);
1427 // update pte
1428 ret_pte = ptep_cmpxchg_rel(mm, mpaddr, pte, old_pte, new_pte);
1429 if (unlikely(pte_val(old_pte) != pte_val(ret_pte))) {
1430 if (pte_pfn(old_pte) == pte_pfn(ret_pte)) {
1431 goto again;
1434 gdprintk(XENLOG_INFO, "%s: old_pte 0x%lx old_arflags 0x%lx mfn 0x%lx "
1435 "ret_pte 0x%lx ret_mfn 0x%lx\n",
1436 __func__,
1437 pte_val(old_pte), old_arflags, mfn,
1438 pte_val(ret_pte), pte_pfn(ret_pte));
1439 return;
1441 BUG_ON(mfn != pte_pfn(ret_pte));
1444 page = mfn_to_page(mfn);
1445 BUG_ON((page->count_info & PGC_count_mask) == 0);
1447 BUG_ON(clear_PGC_allocate && (page_get_owner(page) == NULL));
1448 domain_put_page(d, mpaddr, pte, old_pte, clear_PGC_allocate);
1449 perfc_incr(zap_domain_page_one);
1452 unsigned long
1453 dom0vp_zap_physmap(struct domain *d, unsigned long gpfn,
1454 unsigned int extent_order)
1456 if (extent_order != 0) {
1457 //XXX
1458 return -ENOSYS;
1461 zap_domain_page_one(d, gpfn << PAGE_SHIFT, 1, INVALID_MFN);
1462 perfc_incr(dom0vp_zap_physmap);
1463 return 0;
1466 static unsigned long
1467 __dom0vp_add_physmap(struct domain* d, unsigned long gpfn,
1468 unsigned long mfn_or_gmfn,
1469 unsigned long flags, domid_t domid, int is_gmfn)
1471 int error = -EINVAL;
1472 struct domain* rd;
1473 unsigned long mfn;
1475 /* Not allowed by a domain. */
1476 if (flags & (ASSIGN_nocache | ASSIGN_pgc_allocated))
1477 return -EINVAL;
1479 rd = rcu_lock_domain_by_id(domid);
1480 if (unlikely(rd == NULL)) {
1481 switch (domid) {
1482 case DOMID_XEN:
1483 rd = dom_xen;
1484 break;
1485 case DOMID_IO:
1486 rd = dom_io;
1487 break;
1488 default:
1489 gdprintk(XENLOG_INFO, "d 0x%p domid %d "
1490 "gpfn 0x%lx mfn_or_gmfn 0x%lx flags 0x%lx domid %d\n",
1491 d, d->domain_id, gpfn, mfn_or_gmfn, flags, domid);
1492 return -ESRCH;
1494 BUG_ON(rd == NULL);
1495 rcu_lock_domain(rd);
1498 if (unlikely(rd == d))
1499 goto out1;
1500 /*
1501 * DOMID_XEN and DOMID_IO don't have their own p2m table.
1502 * It can be considered that their p2m conversion is p==m.
1503 */
1504 if (likely(is_gmfn && domid != DOMID_XEN && domid != DOMID_IO))
1505 mfn = gmfn_to_mfn(rd, mfn_or_gmfn);
1506 else
1507 mfn = mfn_or_gmfn;
1508 if (unlikely(!mfn_valid(mfn) || get_page(mfn_to_page(mfn), rd) == 0))
1509 goto out1;
1511 error = 0;
1512 BUG_ON(page_get_owner(mfn_to_page(mfn)) == d &&
1513 get_gpfn_from_mfn(mfn) != INVALID_M2P_ENTRY);
1514 assign_domain_page_replace(d, gpfn << PAGE_SHIFT, mfn, flags);
1515 //don't update p2m table because this page belongs to rd, not d.
1516 perfc_incr(dom0vp_add_physmap);
1517 out1:
1518 rcu_unlock_domain(rd);
1519 return error;
1522 unsigned long
1523 dom0vp_add_physmap(struct domain* d, unsigned long gpfn, unsigned long mfn,
1524 unsigned long flags, domid_t domid)
1526 return __dom0vp_add_physmap(d, gpfn, mfn, flags, domid, 0);
1529 unsigned long
1530 dom0vp_add_physmap_with_gmfn(struct domain* d, unsigned long gpfn,
1531 unsigned long gmfn, unsigned long flags,
1532 domid_t domid)
1534 return __dom0vp_add_physmap(d, gpfn, gmfn, flags, domid, 1);
1537 #ifdef CONFIG_XEN_IA64_EXPOSE_P2M
1538 #define P2M_PFN_ROUNDUP(x) (((x) + PTRS_PER_PTE - 1) & \
1539 ~(PTRS_PER_PTE - 1))
1540 #define P2M_PFN_ROUNDDOWN(x) ((x) & ~(PTRS_PER_PTE - 1))
1541 #define P2M_NUM_PFN(x) (((x) + PTRS_PER_PTE - 1) / PTRS_PER_PTE)
1542 #define MD_END(md) ((md)->phys_addr + \
1543 ((md)->num_pages << EFI_PAGE_SHIFT))
1544 static struct page_info* p2m_pte_zero_page = NULL;
1546 /* This must called before dom0 p2m table allocation */
1547 void __init
1548 expose_p2m_init(void)
1550 pte_t* pte;
1552 /*
1553 * Initialise our DOMID_P2M domain.
1554 * This domain owns m2p table pages.
1555 */
1556 dom_p2m = domain_create(DOMID_P2M, DOMCRF_dummy, 0);
1557 BUG_ON(dom_p2m == NULL);
1558 dom_p2m->max_pages = ~0U;
1560 pte = pte_alloc_one_kernel(NULL, 0);
1561 BUG_ON(pte == NULL);
1562 smp_mb();// make contents of the page visible.
1563 p2m_pte_zero_page = virt_to_page(pte);
1566 // allocate pgd, pmd of dest_dom if necessary
1567 static int
1568 allocate_pgd_pmd(struct domain* dest_dom, unsigned long dest_gpfn,
1569 struct domain* src_dom,
1570 unsigned long src_gpfn, unsigned long num_src_gpfn)
1572 unsigned long i = 0;
1574 BUG_ON((src_gpfn % PTRS_PER_PTE) != 0);
1575 BUG_ON((num_src_gpfn % PTRS_PER_PTE) != 0);
1577 while (i < num_src_gpfn) {
1578 volatile pte_t* src_pte;
1579 volatile pte_t* dest_pte;
1581 src_pte = lookup_noalloc_domain_pte(src_dom,
1582 (src_gpfn + i) << PAGE_SHIFT);
1583 if (src_pte == NULL) {
1584 i++;
1585 continue;
1588 dest_pte = lookup_alloc_domain_pte(dest_dom,
1589 (dest_gpfn << PAGE_SHIFT) +
1590 i * sizeof(pte_t));
1591 if (dest_pte == NULL) {
1592 gdprintk(XENLOG_INFO, "%s failed to allocate pte page\n",
1593 __func__);
1594 return -ENOMEM;
1597 // skip to next pte page
1598 i = P2M_PFN_ROUNDDOWN(i + PTRS_PER_PTE);
1600 return 0;
1603 static int
1604 expose_p2m_page(struct domain* d, unsigned long mpaddr, struct page_info* page)
1606 int ret = get_page(page, dom_p2m);
1607 BUG_ON(ret != 1);
1608 return __assign_domain_page(d, mpaddr, page_to_maddr(page),
1609 ASSIGN_readonly);
1612 // expose pte page
1613 static int
1614 expose_p2m_range(struct domain* dest_dom, unsigned long dest_gpfn,
1615 struct domain* src_dom,
1616 unsigned long src_gpfn, unsigned long num_src_gpfn)
1618 unsigned long i = 0;
1620 BUG_ON((src_gpfn % PTRS_PER_PTE) != 0);
1621 BUG_ON((num_src_gpfn % PTRS_PER_PTE) != 0);
1623 while (i < num_src_gpfn) {
1624 volatile pte_t* pte;
1626 pte = lookup_noalloc_domain_pte(src_dom, (src_gpfn + i) << PAGE_SHIFT);
1627 if (pte == NULL) {
1628 i++;
1629 continue;
1632 if (expose_p2m_page(dest_dom,
1633 (dest_gpfn << PAGE_SHIFT) + i * sizeof(pte_t),
1634 virt_to_page(pte)) < 0) {
1635 gdprintk(XENLOG_INFO, "%s failed to assign page\n", __func__);
1636 return -EAGAIN;
1639 // skip to next pte page
1640 i = P2M_PFN_ROUNDDOWN(i + PTRS_PER_PTE);
1642 return 0;
1645 // expose p2m_pte_zero_page
1646 static int
1647 expose_zero_page(struct domain* dest_dom, unsigned long dest_gpfn,
1648 unsigned long num_src_gpfn)
1650 unsigned long i;
1652 for (i = 0; i < P2M_NUM_PFN(num_src_gpfn); i++) {
1653 volatile pte_t* pte;
1654 pte = lookup_noalloc_domain_pte(dest_dom,
1655 (dest_gpfn + i) << PAGE_SHIFT);
1656 if (pte == NULL || pte_present(*pte))
1657 continue;
1659 if (expose_p2m_page(dest_dom, (dest_gpfn + i) << PAGE_SHIFT,
1660 p2m_pte_zero_page) < 0) {
1661 gdprintk(XENLOG_INFO, "%s failed to assign zero-pte page\n",
1662 __func__);
1663 return -EAGAIN;
1666 return 0;
1669 static int
1670 expose_p2m(struct domain* dest_dom, unsigned long dest_gpfn,
1671 struct domain* src_dom,
1672 unsigned long src_gpfn, unsigned long num_src_gpfn)
1674 if (allocate_pgd_pmd(dest_dom, dest_gpfn,
1675 src_dom, src_gpfn, num_src_gpfn))
1676 return -ENOMEM;
1678 if (expose_p2m_range(dest_dom, dest_gpfn,
1679 src_dom, src_gpfn, num_src_gpfn))
1680 return -EAGAIN;
1682 if (expose_zero_page(dest_dom, dest_gpfn, num_src_gpfn))
1683 return -EAGAIN;
1685 return 0;
1688 static void
1689 unexpose_p2m(struct domain* dest_dom,
1690 unsigned long dest_gpfn, unsigned long num_dest_gpfn)
1692 unsigned long i;
1694 for (i = 0; i < num_dest_gpfn; i++) {
1695 zap_domain_page_one(dest_dom, (dest_gpfn + i) << PAGE_SHIFT,
1696 0, INVALID_MFN);
1700 // It is possible to optimize loop, But this isn't performance critical.
1701 unsigned long
1702 dom0vp_expose_p2m(struct domain* d,
1703 unsigned long conv_start_gpfn,
1704 unsigned long assign_start_gpfn,
1705 unsigned long expose_size, unsigned long granule_pfn)
1707 unsigned long ret;
1708 unsigned long expose_num_pfn = expose_size >> PAGE_SHIFT;
1710 if ((expose_size % PAGE_SIZE) != 0 ||
1711 (granule_pfn % PTRS_PER_PTE) != 0 ||
1712 (expose_num_pfn % PTRS_PER_PTE) != 0 ||
1713 (conv_start_gpfn % granule_pfn) != 0 ||
1714 (assign_start_gpfn % granule_pfn) != 0 ||
1715 (expose_num_pfn % granule_pfn) != 0) {
1716 gdprintk(XENLOG_INFO,
1717 "%s conv_start_gpfn 0x%016lx assign_start_gpfn 0x%016lx "
1718 "expose_size 0x%016lx granulte_pfn 0x%016lx\n", __func__,
1719 conv_start_gpfn, assign_start_gpfn, expose_size, granule_pfn);
1720 return -EINVAL;
1723 if (granule_pfn != PTRS_PER_PTE) {
1724 gdprintk(XENLOG_INFO,
1725 "%s granule_pfn 0x%016lx PTRS_PER_PTE 0x%016lx\n",
1726 __func__, granule_pfn, PTRS_PER_PTE);
1727 return -ENOSYS;
1729 ret = expose_p2m(d, assign_start_gpfn,
1730 d, conv_start_gpfn, expose_num_pfn);
1731 return ret;
1734 static int
1735 memmap_info_copy_from_guest(struct xen_ia64_memmap_info* memmap_info,
1736 char** memmap_p,
1737 XEN_GUEST_HANDLE(char) buffer)
1739 char *memmap;
1740 char *p;
1741 char *memmap_end;
1742 efi_memory_desc_t *md;
1743 unsigned long start;
1744 unsigned long end;
1745 efi_memory_desc_t *prev_md;
1747 if (copy_from_guest((char*)memmap_info, buffer, sizeof(*memmap_info)))
1748 return -EFAULT;
1749 if (memmap_info->efi_memdesc_size < sizeof(efi_memory_desc_t) ||
1750 memmap_info->efi_memmap_size < memmap_info->efi_memdesc_size ||
1751 (memmap_info->efi_memmap_size % memmap_info->efi_memdesc_size) != 0)
1752 return -EINVAL;
1754 memmap = _xmalloc(memmap_info->efi_memmap_size,
1755 __alignof__(efi_memory_desc_t));
1756 if (memmap == NULL)
1757 return -ENOMEM;
1758 if (copy_from_guest_offset(memmap, buffer, sizeof(*memmap_info),
1759 memmap_info->efi_memmap_size)) {
1760 xfree(memmap);
1761 return -EFAULT;
1764 /* intergirty check & simplify */
1765 sort(memmap, memmap_info->efi_memmap_size / memmap_info->efi_memdesc_size,
1766 memmap_info->efi_memdesc_size, efi_mdt_cmp, NULL);
1768 /* alignement & overlap check */
1769 prev_md = NULL;
1770 p = memmap;
1771 memmap_end = memmap + memmap_info->efi_memmap_size;
1772 for (p = memmap; p < memmap_end; p += memmap_info->efi_memmap_size) {
1773 md = (efi_memory_desc_t*)p;
1774 start = md->phys_addr;
1776 if (start & ((1UL << EFI_PAGE_SHIFT) - 1) || md->num_pages == 0) {
1777 xfree(memmap);
1778 return -EINVAL;
1781 if (prev_md != NULL) {
1782 unsigned long prev_end = MD_END(prev_md);
1783 if (prev_end > start) {
1784 xfree(memmap);
1785 return -EINVAL;
1789 prev_md = (efi_memory_desc_t *)p;
1792 /* coalease */
1793 prev_md = NULL;
1794 p = memmap;
1795 while (p < memmap_end) {
1796 md = (efi_memory_desc_t*)p;
1797 start = md->phys_addr;
1798 end = MD_END(md);
1800 start = P2M_PFN_ROUNDDOWN(start >> PAGE_SHIFT) << PAGE_SHIFT;
1801 end = P2M_PFN_ROUNDUP(end >> PAGE_SHIFT) << PAGE_SHIFT;
1802 md->phys_addr = start;
1803 md->num_pages = (end - start) >> EFI_PAGE_SHIFT;
1805 if (prev_md != NULL) {
1806 unsigned long prev_end = MD_END(prev_md);
1807 if (prev_end >= start) {
1808 size_t left;
1809 end = max(prev_end, end);
1810 prev_md->num_pages = (end - prev_md->phys_addr) >> EFI_PAGE_SHIFT;
1812 left = memmap_end - p;
1813 if (left > memmap_info->efi_memdesc_size) {
1814 left -= memmap_info->efi_memdesc_size;
1815 memmove(p, p + memmap_info->efi_memdesc_size, left);
1818 memmap_info->efi_memmap_size -= memmap_info->efi_memdesc_size;
1819 memmap_end -= memmap_info->efi_memdesc_size;
1820 continue;
1824 prev_md = md;
1825 p += memmap_info->efi_memdesc_size;
1828 if (copy_to_guest(buffer, (char*)memmap_info, sizeof(*memmap_info)) ||
1829 copy_to_guest_offset(buffer, sizeof(*memmap_info),
1830 (char*)memmap, memmap_info->efi_memmap_size)) {
1831 xfree(memmap);
1832 return -EFAULT;
1835 *memmap_p = memmap;
1836 return 0;
1839 static int
1840 foreign_p2m_allocate_pte(struct domain* d,
1841 const struct xen_ia64_memmap_info* memmap_info,
1842 const void* memmap)
1844 const void* memmap_end = memmap + memmap_info->efi_memmap_size;
1845 const void* p;
1847 for (p = memmap; p < memmap_end; p += memmap_info->efi_memdesc_size) {
1848 const efi_memory_desc_t* md = p;
1849 unsigned long start = md->phys_addr;
1850 unsigned long end = MD_END(md);
1851 unsigned long gpaddr;
1853 for (gpaddr = start; gpaddr < end; gpaddr += PAGE_SIZE) {
1854 if (lookup_alloc_domain_pte(d, gpaddr) == NULL) {
1855 return -ENOMEM;
1860 return 0;
1863 struct foreign_p2m_region {
1864 unsigned long gpfn;
1865 unsigned long num_gpfn;
1866 };
1868 struct foreign_p2m_entry {
1869 struct list_head list;
1870 int busy;
1872 /* src domain */
1873 struct domain* src_dom;
1875 /* region into which foreign p2m table is mapped */
1876 unsigned long gpfn;
1877 unsigned long num_gpfn;
1878 unsigned int num_region;
1879 struct foreign_p2m_region region[0];
1880 };
1882 /* caller must increment the reference count of src_dom */
1883 static int
1884 foreign_p2m_alloc(struct foreign_p2m* foreign_p2m,
1885 unsigned long dest_gpfn, struct domain* src_dom,
1886 struct xen_ia64_memmap_info* memmap_info, void* memmap,
1887 struct foreign_p2m_entry** entryp)
1889 void* memmap_end = memmap + memmap_info->efi_memmap_size;
1890 efi_memory_desc_t* md;
1891 unsigned long dest_gpfn_end;
1892 unsigned long src_gpfn;
1893 unsigned long src_gpfn_end;
1895 unsigned int num_region;
1896 struct foreign_p2m_entry* entry;
1897 struct foreign_p2m_entry* prev;
1898 struct foreign_p2m_entry* pos;
1900 num_region = (memmap_end - memmap) / memmap_info->efi_memdesc_size;
1902 md = memmap;
1903 src_gpfn = P2M_PFN_ROUNDDOWN(md->phys_addr >> PAGE_SHIFT);
1905 md = memmap + (num_region - 1) * memmap_info->efi_memdesc_size;
1906 src_gpfn_end = MD_END(md) >> PAGE_SHIFT;
1907 if (src_gpfn_end >
1908 P2M_PFN_ROUNDUP(src_dom->arch.convmem_end >> PAGE_SHIFT))
1909 return -EINVAL;
1911 src_gpfn_end = P2M_PFN_ROUNDUP(src_gpfn_end);
1912 dest_gpfn_end = dest_gpfn + P2M_NUM_PFN(src_gpfn_end - src_gpfn);
1913 entry = _xmalloc(sizeof(*entry) + num_region * sizeof(entry->region[0]),
1914 __alignof__(*entry));
1915 if (entry == NULL)
1916 return -ENOMEM;
1918 entry->busy = 1;
1919 entry->gpfn = dest_gpfn;
1920 entry->num_gpfn = dest_gpfn_end - dest_gpfn;
1921 entry->src_dom = src_dom;
1922 entry->num_region = 0;
1923 memset(entry->region, 0, sizeof(entry->region[0]) * num_region);
1924 prev = NULL;
1926 spin_lock(&foreign_p2m->lock);
1927 if (list_empty(&foreign_p2m->head))
1928 prev = (struct foreign_p2m_entry*)&foreign_p2m->head;
1930 list_for_each_entry(pos, &foreign_p2m->head, list) {
1931 if (pos->gpfn + pos->num_gpfn < dest_gpfn) {
1932 prev = pos;
1933 continue;
1936 if (dest_gpfn_end < pos->gpfn) {
1937 if (prev != NULL && prev->gpfn + prev->num_gpfn > dest_gpfn)
1938 prev = NULL;/* overlap */
1939 break;
1942 /* overlap */
1943 prev = NULL;
1944 break;
1946 if (prev != NULL) {
1947 list_add(&entry->list, &prev->list);
1948 spin_unlock(&foreign_p2m->lock);
1949 *entryp = entry;
1950 return 0;
1952 spin_unlock(&foreign_p2m->lock);
1953 xfree(entry);
1954 return -EBUSY;
1957 static void
1958 foreign_p2m_unexpose(struct domain* dest_dom, struct foreign_p2m_entry* entry)
1960 unsigned int i;
1962 BUG_ON(!entry->busy);
1963 for (i = 0; i < entry->num_region; i++)
1964 unexpose_p2m(dest_dom,
1965 entry->region[i].gpfn, entry->region[i].num_gpfn);
1968 static void
1969 foreign_p2m_unbusy(struct foreign_p2m* foreign_p2m,
1970 struct foreign_p2m_entry* entry)
1972 spin_lock(&foreign_p2m->lock);
1973 BUG_ON(!entry->busy);
1974 entry->busy = 0;
1975 spin_unlock(&foreign_p2m->lock);
1978 static void
1979 foreign_p2m_free(struct foreign_p2m* foreign_p2m,
1980 struct foreign_p2m_entry* entry)
1982 spin_lock(&foreign_p2m->lock);
1983 BUG_ON(!entry->busy);
1984 list_del(&entry->list);
1985 spin_unlock(&foreign_p2m->lock);
1987 put_domain(entry->src_dom);
1988 xfree(entry);
1991 void
1992 foreign_p2m_init(struct domain* d)
1994 struct foreign_p2m* foreign_p2m = &d->arch.foreign_p2m;
1995 INIT_LIST_HEAD(&foreign_p2m->head);
1996 spin_lock_init(&foreign_p2m->lock);
1999 void
2000 foreign_p2m_destroy(struct domain* d)
2002 struct foreign_p2m* foreign_p2m = &d->arch.foreign_p2m;
2003 struct foreign_p2m_entry* entry;
2004 struct foreign_p2m_entry* n;
2006 spin_lock(&foreign_p2m->lock);
2007 list_for_each_entry_safe(entry, n, &foreign_p2m->head, list) {
2008 /* mm_teardown() cleared p2m table already */
2009 /* foreign_p2m_unexpose(d, entry);*/
2010 list_del(&entry->list);
2011 put_domain(entry->src_dom);
2012 xfree(entry);
2014 spin_unlock(&foreign_p2m->lock);
2017 unsigned long
2018 dom0vp_expose_foreign_p2m(struct domain* dest_dom,
2019 unsigned long dest_gpfn,
2020 domid_t domid,
2021 XEN_GUEST_HANDLE(char) buffer,
2022 unsigned long flags)
2024 unsigned long ret = 0;
2025 struct domain* src_dom;
2026 struct xen_ia64_memmap_info memmap_info;
2027 char* memmap;
2028 void* memmap_end;
2029 void* p;
2031 struct foreign_p2m_entry* entry;
2033 ret = memmap_info_copy_from_guest(&memmap_info, &memmap, buffer);
2034 if (ret != 0)
2035 return ret;
2037 dest_dom = rcu_lock_domain(dest_dom);
2038 if (dest_dom == NULL) {
2039 ret = -EINVAL;
2040 goto out;
2042 #if 1
2043 // Self foreign domain p2m exposure isn't allowed.
2044 // Otherwise the domain can't be destroyed because
2045 // no one decrements the domain reference count.
2046 if (domid == dest_dom->domain_id) {
2047 ret = -EINVAL;
2048 goto out;
2050 #endif
2052 src_dom = get_domain_by_id(domid);
2053 if (src_dom == NULL) {
2054 ret = -EINVAL;
2055 goto out_unlock;
2058 if (flags & IA64_DOM0VP_EFP_ALLOC_PTE) {
2059 ret = foreign_p2m_allocate_pte(src_dom, &memmap_info, memmap);
2060 if (ret != 0)
2061 goto out_unlock;
2064 ret = foreign_p2m_alloc(&dest_dom->arch.foreign_p2m, dest_gpfn,
2065 src_dom, &memmap_info, memmap, &entry);
2066 if (ret != 0)
2067 goto out_unlock;
2069 memmap_end = memmap + memmap_info.efi_memmap_size;
2070 for (p = memmap; p < memmap_end; p += memmap_info.efi_memdesc_size) {
2071 efi_memory_desc_t* md = p;
2072 unsigned long src_gpfn =
2073 P2M_PFN_ROUNDDOWN(md->phys_addr >> PAGE_SHIFT);
2074 unsigned long src_gpfn_end =
2075 P2M_PFN_ROUNDUP(MD_END(md) >> PAGE_SHIFT);
2076 unsigned long num_src_gpfn = src_gpfn_end - src_gpfn;
2078 ret = expose_p2m(dest_dom, dest_gpfn + src_gpfn / PTRS_PER_PTE,
2079 src_dom, src_gpfn, num_src_gpfn);
2080 if (ret != 0)
2081 break;
2083 entry->region[entry->num_region].gpfn =
2084 dest_gpfn + src_gpfn / PTRS_PER_PTE;
2085 entry->region[entry->num_region].num_gpfn = P2M_NUM_PFN(num_src_gpfn);
2086 entry->num_region++;
2089 if (ret == 0) {
2090 foreign_p2m_unbusy(&dest_dom->arch.foreign_p2m, entry);
2091 } else {
2092 foreign_p2m_unexpose(dest_dom, entry);
2093 foreign_p2m_free(&dest_dom->arch.foreign_p2m, entry);
2096 out_unlock:
2097 rcu_unlock_domain(dest_dom);
2098 out:
2099 xfree(memmap);
2100 return ret;
2103 unsigned long
2104 dom0vp_unexpose_foreign_p2m(struct domain* dest_dom,
2105 unsigned long dest_gpfn,
2106 domid_t domid)
2108 int ret = -ENOENT;
2109 struct foreign_p2m* foreign_p2m = &dest_dom->arch.foreign_p2m;
2110 struct foreign_p2m_entry* entry;
2112 dest_dom = rcu_lock_domain(dest_dom);
2113 if (dest_dom == NULL)
2114 return ret;
2115 spin_lock(&foreign_p2m->lock);
2116 list_for_each_entry(entry, &foreign_p2m->head, list) {
2117 if (entry->gpfn < dest_gpfn)
2118 continue;
2119 if (dest_gpfn < entry->gpfn)
2120 break;
2122 if (domid == entry->src_dom->domain_id)
2123 ret = 0;
2124 else
2125 ret = -EINVAL;
2126 break;
2128 if (ret == 0) {
2129 if (entry->busy == 0)
2130 entry->busy = 1;
2131 else
2132 ret = -EBUSY;
2134 spin_unlock(&foreign_p2m->lock);
2136 if (ret == 0) {
2137 foreign_p2m_unexpose(dest_dom, entry);
2138 foreign_p2m_free(&dest_dom->arch.foreign_p2m, entry);
2140 rcu_unlock_domain(dest_dom);
2141 return ret;
2143 #endif
2145 // grant table host mapping
2146 // mpaddr: host_addr: pseudo physical address
2147 // mfn: frame: machine page frame
2148 // flags: GNTMAP_readonly | GNTMAP_application_map | GNTMAP_contains_pte
2149 int
2150 create_grant_host_mapping(unsigned long gpaddr, unsigned long mfn,
2151 unsigned int flags, unsigned int cache_flags)
2153 struct domain* d = current->domain;
2154 struct page_info* page;
2155 int ret;
2157 if ((flags & (GNTMAP_device_map |
2158 GNTMAP_application_map | GNTMAP_contains_pte)) ||
2159 (cache_flags)) {
2160 gdprintk(XENLOG_INFO, "%s: flags 0x%x cache_flags 0x%x\n",
2161 __func__, flags, cache_flags);
2162 return GNTST_general_error;
2165 BUG_ON(!mfn_valid(mfn));
2166 page = mfn_to_page(mfn);
2167 ret = get_page(page, page_get_owner(page));
2168 BUG_ON(ret == 0);
2169 assign_domain_page_replace(d, gpaddr, mfn,
2170 #ifdef CONFIG_XEN_IA64_TLB_TRACK
2171 ASSIGN_tlb_track |
2172 #endif
2173 ((flags & GNTMAP_readonly) ?
2174 ASSIGN_readonly : ASSIGN_writable));
2175 perfc_incr(create_grant_host_mapping);
2176 return GNTST_okay;
2179 // grant table host unmapping
2180 int
2181 replace_grant_host_mapping(unsigned long gpaddr,
2182 unsigned long mfn, unsigned long new_gpaddr, unsigned int flags)
2184 struct domain* d = current->domain;
2185 unsigned long gpfn = gpaddr >> PAGE_SHIFT;
2186 volatile pte_t* pte;
2187 unsigned long cur_arflags;
2188 pte_t cur_pte;
2189 pte_t new_pte = __pte(0);
2190 pte_t old_pte;
2191 struct page_info* page = mfn_to_page(mfn);
2192 struct page_info* new_page = NULL;
2193 volatile pte_t* new_page_pte = NULL;
2194 unsigned long new_page_mfn = INVALID_MFN;
2196 if (new_gpaddr) {
2197 new_page_pte = lookup_noalloc_domain_pte_none(d, new_gpaddr);
2198 if (likely(new_page_pte != NULL)) {
2199 new_pte = ptep_get_and_clear(&d->arch.mm,
2200 new_gpaddr, new_page_pte);
2201 if (likely(pte_present(new_pte))) {
2202 struct domain* page_owner;
2204 new_page_mfn = pte_pfn(new_pte);
2205 new_page = mfn_to_page(new_page_mfn);
2206 page_owner = page_get_owner(new_page);
2207 if (unlikely(page_owner == NULL)) {
2208 gdprintk(XENLOG_INFO,
2209 "%s: page_owner == NULL "
2210 "gpaddr 0x%lx mfn 0x%lx "
2211 "new_gpaddr 0x%lx mfn 0x%lx\n",
2212 __func__, gpaddr, mfn, new_gpaddr, new_page_mfn);
2213 new_page = NULL; /* prevent domain_put_page() */
2214 return GNTST_general_error;
2217 /*
2218 * domain_put_page(clear_PGC_allcoated = 0)
2219 * doesn't decrement refcount of page with
2220 * pte_ptc_allocated() = 1. Be carefull.
2221 */
2222 if (unlikely(!pte_pgc_allocated(new_pte))) {
2223 /* domain_put_page() decrements page refcount. adjust it. */
2224 if (get_page(new_page, page_owner)) {
2225 gdprintk(XENLOG_INFO,
2226 "%s: get_page() failed. "
2227 "gpaddr 0x%lx mfn 0x%lx "
2228 "new_gpaddr 0x%lx mfn 0x%lx\n",
2229 __func__, gpaddr, mfn,
2230 new_gpaddr, new_page_mfn);
2231 return GNTST_general_error;
2234 domain_put_page(d, new_gpaddr, new_page_pte, new_pte, 0);
2235 } else
2236 new_pte = __pte(0);
2240 if (flags & (GNTMAP_application_map | GNTMAP_contains_pte)) {
2241 gdprintk(XENLOG_INFO, "%s: flags 0x%x\n", __func__, flags);
2242 return GNTST_general_error;
2245 pte = lookup_noalloc_domain_pte(d, gpaddr);
2246 if (pte == NULL) {
2247 gdprintk(XENLOG_INFO, "%s: gpaddr 0x%lx mfn 0x%lx\n",
2248 __func__, gpaddr, mfn);
2249 return GNTST_general_error;
2252 again:
2253 cur_arflags = pte_val(*pte) & ~_PAGE_PPN_MASK;
2254 cur_pte = pfn_pte(mfn, __pgprot(cur_arflags));
2255 if (!pte_present(cur_pte) ||
2256 (page_get_owner(page) == d && get_gpfn_from_mfn(mfn) == gpfn)) {
2257 gdprintk(XENLOG_INFO, "%s: gpaddr 0x%lx mfn 0x%lx cur_pte 0x%lx\n",
2258 __func__, gpaddr, mfn, pte_val(cur_pte));
2259 return GNTST_general_error;
2262 if (new_page) {
2263 BUG_ON(new_page_mfn == INVALID_MFN);
2264 set_gpfn_from_mfn(new_page_mfn, gpfn);
2265 /* smp_mb() isn't needed because assign_domain_pge_cmpxchg_rel()
2266 has release semantics. */
2268 old_pte = ptep_cmpxchg_rel(&d->arch.mm, gpaddr, pte, cur_pte, new_pte);
2269 if (unlikely(pte_val(cur_pte) != pte_val(old_pte))) {
2270 if (pte_pfn(old_pte) == mfn) {
2271 goto again;
2273 if (new_page) {
2274 BUG_ON(new_page_mfn == INVALID_MFN);
2275 set_gpfn_from_mfn(new_page_mfn, INVALID_M2P_ENTRY);
2276 domain_put_page(d, new_gpaddr, new_page_pte, new_pte, 1);
2278 goto out;
2280 if (unlikely(!pte_present(old_pte)))
2281 goto out;
2282 BUG_ON(pte_pfn(old_pte) != mfn);
2284 /* try_to_clear_PGC_allocate(d, page) is not needed. */
2285 BUG_ON(page_get_owner(page) == d &&
2286 get_gpfn_from_mfn(mfn) == gpfn);
2287 BUG_ON(pte_pgc_allocated(old_pte));
2288 domain_page_flush_and_put(d, gpaddr, pte, old_pte, page);
2290 perfc_incr(replace_grant_host_mapping);
2291 return GNTST_okay;
2293 out:
2294 gdprintk(XENLOG_INFO, "%s gpaddr 0x%lx mfn 0x%lx cur_pte "
2295 "0x%lx old_pte 0x%lx\n",
2296 __func__, gpaddr, mfn, pte_val(cur_pte), pte_val(old_pte));
2297 return GNTST_general_error;
2300 // heavily depends on the struct page layout.
2301 // gnttab_transfer() calls steal_page() with memflags = 0
2302 // For grant table transfer, we must fill the page.
2303 // memory_exchange() calls steal_page() with memflags = MEMF_no_refcount
2304 // For memory exchange, we don't have to fill the page because
2305 // memory_exchange() does it.
2306 int
2307 steal_page(struct domain *d, struct page_info *page, unsigned int memflags)
2309 #if 0 /* if big endian */
2310 # error "implement big endian version of steal_page()"
2311 #endif
2312 u32 _d, _nd;
2313 u64 x, nx, y;
2315 if (page_get_owner(page) != d) {
2316 gdprintk(XENLOG_INFO, "%s d 0x%p owner 0x%p\n",
2317 __func__, d, page_get_owner(page));
2318 return -1;
2321 if (!(memflags & MEMF_no_refcount)) {
2322 unsigned long gpfn;
2323 struct page_info *new;
2324 unsigned long new_mfn;
2325 int ret;
2327 new = alloc_domheap_page(d, 0);
2328 if (new == NULL) {
2329 gdprintk(XENLOG_INFO, "alloc_domheap_page() failed\n");
2330 return -1;
2332 // zero out pages for security reasons
2333 clear_page(page_to_virt(new));
2334 // assign_domain_page_cmpxchg_rel() has release semantics
2335 // so smp_mb() isn't needed.
2337 gpfn = get_gpfn_from_mfn(page_to_mfn(page));
2338 if (gpfn == INVALID_M2P_ENTRY) {
2339 free_domheap_page(new);
2340 return -1;
2342 new_mfn = page_to_mfn(new);
2343 set_gpfn_from_mfn(new_mfn, gpfn);
2344 // smp_mb() isn't needed because assign_domain_pge_cmpxchg_rel()
2345 // has release semantics.
2347 ret = assign_domain_page_cmpxchg_rel(d, gpfn << PAGE_SHIFT, page, new,
2348 ASSIGN_writable |
2349 ASSIGN_pgc_allocated, 0);
2350 if (ret < 0) {
2351 gdprintk(XENLOG_INFO, "assign_domain_page_cmpxchg_rel failed %d\n",
2352 ret);
2353 set_gpfn_from_mfn(new_mfn, INVALID_M2P_ENTRY);
2354 free_domheap_page(new);
2355 return -1;
2357 perfc_incr(steal_page_refcount);
2360 spin_lock(&d->page_alloc_lock);
2362 /*
2363 * The tricky bit: atomically release ownership while there is just one
2364 * benign reference to the page (PGC_allocated). If that reference
2365 * disappears then the deallocation routine will safely spin.
2366 */
2367 _d = pickle_domptr(d);
2368 y = *((u64*)&page->count_info);
2369 do {
2370 x = y;
2371 nx = x & 0xffffffff;
2372 // page->count_info: untouched
2373 // page->u.inused._domain = 0;
2374 _nd = x >> 32;
2376 if (unlikely(((x & (PGC_count_mask | PGC_allocated)) !=
2377 (1 | PGC_allocated))) ||
2378 unlikely(_nd != _d)) {
2379 struct domain* nd = unpickle_domptr(_nd);
2380 if (nd == NULL) {
2381 gdprintk(XENLOG_INFO, "gnttab_transfer: "
2382 "Bad page %p: ed=%p(%u) 0x%x, "
2383 "sd=%p 0x%x,"
2384 " caf=%016lx, taf=%" PRtype_info
2385 " memflags 0x%x\n",
2386 (void *) page_to_mfn(page),
2387 d, d->domain_id, _d,
2388 nd, _nd,
2389 x,
2390 page->u.inuse.type_info,
2391 memflags);
2392 } else {
2393 gdprintk(XENLOG_WARNING, "gnttab_transfer: "
2394 "Bad page %p: ed=%p(%u) 0x%x, "
2395 "sd=%p(%u) 0x%x,"
2396 " caf=%016lx, taf=%" PRtype_info
2397 " memflags 0x%x\n",
2398 (void *) page_to_mfn(page),
2399 d, d->domain_id, _d,
2400 nd, nd->domain_id, _nd,
2401 x,
2402 page->u.inuse.type_info,
2403 memflags);
2405 spin_unlock(&d->page_alloc_lock);
2406 return -1;
2409 y = cmpxchg((u64*)&page->count_info, x, nx);
2410 } while (unlikely(y != x));
2412 /*
2413 * Unlink from 'd'. At least one reference remains (now anonymous), so
2414 * noone else is spinning to try to delete this page from 'd'.
2415 */
2416 if ( !(memflags & MEMF_no_refcount) )
2417 d->tot_pages--;
2418 list_del(&page->list);
2420 spin_unlock(&d->page_alloc_lock);
2421 perfc_incr(steal_page);
2422 return 0;
2425 int
2426 guest_physmap_add_page(struct domain *d, unsigned long gpfn,
2427 unsigned long mfn, unsigned int page_order)
2429 unsigned long i;
2431 for (i = 0; i < (1UL << page_order); i++) {
2432 BUG_ON(!mfn_valid(mfn));
2433 BUG_ON(mfn_to_page(mfn)->count_info != (PGC_allocated | 1));
2434 set_gpfn_from_mfn(mfn, gpfn);
2435 smp_mb();
2436 assign_domain_page_replace(d, gpfn << PAGE_SHIFT, mfn,
2437 ASSIGN_writable | ASSIGN_pgc_allocated);
2438 mfn++;
2439 gpfn++;
2442 perfc_incr(guest_physmap_add_page);
2443 return 0;
2446 void
2447 guest_physmap_remove_page(struct domain *d, unsigned long gpfn,
2448 unsigned long mfn, unsigned int page_order)
2450 unsigned long i;
2452 BUG_ON(mfn == 0);//XXX
2454 for (i = 0; i < (1UL << page_order); i++)
2455 zap_domain_page_one(d, (gpfn+i) << PAGE_SHIFT, 0, mfn+i);
2457 perfc_incr(guest_physmap_remove_page);
2460 static void
2461 domain_page_flush_and_put(struct domain* d, unsigned long mpaddr,
2462 volatile pte_t* ptep, pte_t old_pte,
2463 struct page_info* page)
2465 #ifdef CONFIG_XEN_IA64_TLB_TRACK
2466 struct tlb_track_entry* entry;
2467 #endif
2469 if (shadow_mode_enabled(d))
2470 shadow_mark_page_dirty(d, mpaddr >> PAGE_SHIFT);
2472 #ifndef CONFIG_XEN_IA64_TLB_TRACK
2473 //XXX sledgehammer.
2474 // flush finer range.
2475 domain_flush_vtlb_all(d);
2476 put_page(page);
2477 #else
2478 switch (tlb_track_search_and_remove(d->arch.tlb_track,
2479 ptep, old_pte, &entry)) {
2480 case TLB_TRACK_NOT_TRACKED:
2481 // dprintk(XENLOG_WARNING, "%s TLB_TRACK_NOT_TRACKED\n", __func__);
2482 /* This page is zapped from this domain
2483 * by memory decrease or exchange or dom0vp_zap_physmap.
2484 * I.e. the page is zapped for returning this page to xen
2485 * (balloon driver or DMA page allocation) or
2486 * foreign domain mapped page is unmapped from the domain.
2487 * In the former case the page is to be freed so that
2488 * we can defer freeing page to batch.
2489 * In the latter case the page is unmapped so that
2490 * we need to flush it. But to optimize it, we
2491 * queue the page and flush vTLB only once.
2492 * I.e. The caller must call dfree_flush() explicitly.
2493 */
2494 domain_flush_vtlb_all(d);
2495 put_page(page);
2496 break;
2497 case TLB_TRACK_NOT_FOUND:
2498 // dprintk(XENLOG_WARNING, "%s TLB_TRACK_NOT_FOUND\n", __func__);
2499 /* This page is zapped from this domain
2500 * by grant table page unmap.
2501 * Luckily the domain that mapped this page didn't
2502 * access this page so that we don't have to flush vTLB.
2503 * Probably the domain did only DMA.
2504 */
2505 /* do nothing */
2506 put_page(page);
2507 break;
2508 case TLB_TRACK_FOUND:
2509 // dprintk(XENLOG_WARNING, "%s TLB_TRACK_FOUND\n", __func__);
2510 /* This page is zapped from this domain
2511 * by grant table page unmap.
2512 * Fortunately this page is accessced via only one virtual
2513 * memory address. So it is easy to flush it.
2514 */
2515 domain_flush_vtlb_track_entry(d, entry);
2516 tlb_track_free_entry(d->arch.tlb_track, entry);
2517 put_page(page);
2518 break;
2519 case TLB_TRACK_MANY:
2520 gdprintk(XENLOG_INFO, "%s TLB_TRACK_MANY\n", __func__);
2521 /* This page is zapped from this domain
2522 * by grant table page unmap.
2523 * Unfortunately this page is accessced via many virtual
2524 * memory address (or too many times with single virtual address).
2525 * So we abondaned to track virtual addresses.
2526 * full vTLB flush is necessary.
2527 */
2528 domain_flush_vtlb_all(d);
2529 put_page(page);
2530 break;
2531 case TLB_TRACK_AGAIN:
2532 gdprintk(XENLOG_ERR, "%s TLB_TRACK_AGAIN\n", __func__);
2533 BUG();
2534 break;
2536 #endif
2537 perfc_incr(domain_page_flush_and_put);
2540 int
2541 domain_page_mapped(struct domain* d, unsigned long mpaddr)
2543 volatile pte_t * pte;
2545 pte = lookup_noalloc_domain_pte(d, mpaddr);
2546 if(pte != NULL && !pte_none(*pte))
2547 return 1;
2548 return 0;
2551 /* Flush cache of domain d. */
2552 void domain_cache_flush (struct domain *d, int sync_only)
2554 struct mm_struct *mm = &d->arch.mm;
2555 volatile pgd_t *pgd = mm->pgd;
2556 unsigned long maddr;
2557 int i,j,k, l;
2558 int nbr_page = 0;
2559 void (*flush_func)(unsigned long start, unsigned long end);
2560 extern void flush_dcache_range (unsigned long, unsigned long);
2562 if (sync_only)
2563 flush_func = &flush_icache_range;
2564 else
2565 flush_func = &flush_dcache_range;
2567 for (i = 0; i < PTRS_PER_PGD; pgd++, i++) {
2568 volatile pud_t *pud;
2569 if (!pgd_present(*pgd)) // acquire semantics
2570 continue;
2571 pud = pud_offset(pgd, 0);
2572 for (j = 0; j < PTRS_PER_PUD; pud++, j++) {
2573 volatile pmd_t *pmd;
2574 if (!pud_present(*pud)) // acquire semantics
2575 continue;
2576 pmd = pmd_offset(pud, 0);
2577 for (k = 0; k < PTRS_PER_PMD; pmd++, k++) {
2578 volatile pte_t *pte;
2579 if (!pmd_present(*pmd)) // acquire semantics
2580 continue;
2581 pte = pte_offset_map(pmd, 0);
2582 for (l = 0; l < PTRS_PER_PTE; pte++, l++) {
2583 if (!pte_present(*pte)) // acquire semantics
2584 continue;
2585 /* Convert PTE to maddr. */
2586 maddr = __va_ul (pte_val(*pte)
2587 & _PAGE_PPN_MASK);
2588 (*flush_func)(maddr, maddr+ PAGE_SIZE);
2589 nbr_page++;
2594 //printk ("domain_cache_flush: %d %d pages\n", d->domain_id, nbr_page);
2597 #ifdef VERBOSE
2598 #define MEM_LOG(_f, _a...) \
2599 printk("DOM%u: (file=mm.c, line=%d) " _f "\n", \
2600 current->domain->domain_id , __LINE__ , ## _a )
2601 #else
2602 #define MEM_LOG(_f, _a...) ((void)0)
2603 #endif
2605 static void free_page_type(struct page_info *page, u32 type)
2609 static int alloc_page_type(struct page_info *page, u32 type)
2611 return 1;
2614 static int opt_p2m_xenheap;
2615 boolean_param("p2m_xenheap", opt_p2m_xenheap);
2617 void *pgtable_quicklist_alloc(void)
2619 void *p;
2621 BUG_ON(dom_p2m == NULL);
2622 if (!opt_p2m_xenheap) {
2623 struct page_info *page = alloc_domheap_page(dom_p2m, 0);
2624 if (page == NULL)
2625 return NULL;
2626 p = page_to_virt(page);
2627 clear_page(p);
2628 return p;
2630 p = alloc_xenheap_pages(0);
2631 if (p) {
2632 clear_page(p);
2633 /*
2634 * This page should be read only. At this moment, the third
2635 * argument doesn't make sense. It should be 1 when supported.
2636 */
2637 share_xen_page_with_guest(virt_to_page(p), dom_p2m, 0);
2639 return p;
2642 void pgtable_quicklist_free(void *pgtable_entry)
2644 struct page_info* page = virt_to_page(pgtable_entry);
2646 BUG_ON(page_get_owner(page) != dom_p2m);
2647 BUG_ON(page->count_info != (1 | PGC_allocated));
2649 put_page(page);
2650 if (opt_p2m_xenheap)
2651 free_xenheap_page(pgtable_entry);
2654 void put_page_type(struct page_info *page)
2656 u64 nx, x, y = page->u.inuse.type_info;
2658 again:
2659 do {
2660 x = y;
2661 nx = x - 1;
2663 ASSERT((x & PGT_count_mask) != 0);
2665 /*
2666 * The page should always be validated while a reference is held. The
2667 * exception is during domain destruction, when we forcibly invalidate
2668 * page-table pages if we detect a referential loop.
2669 * See domain.c:relinquish_list().
2670 */
2671 ASSERT((x & PGT_validated) || page_get_owner(page)->is_dying);
2673 if ( unlikely((nx & PGT_count_mask) == 0) )
2675 /* Record TLB information for flush later. Races are harmless. */
2676 page->tlbflush_timestamp = tlbflush_current_time();
2678 if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) &&
2679 likely(nx & PGT_validated) )
2681 /*
2682 * Page-table pages must be unvalidated when count is zero. The
2683 * 'free' is safe because the refcnt is non-zero and validated
2684 * bit is clear => other ops will spin or fail.
2685 */
2686 if ( unlikely((y = cmpxchg(&page->u.inuse.type_info, x,
2687 x & ~PGT_validated)) != x) )
2688 goto again;
2689 /* We cleared the 'valid bit' so we do the clean up. */
2690 free_page_type(page, x);
2691 /* Carry on, but with the 'valid bit' now clear. */
2692 x &= ~PGT_validated;
2693 nx &= ~PGT_validated;
2697 while ( unlikely((y = cmpxchg_rel(&page->u.inuse.type_info, x, nx)) != x) );
2701 static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
2703 struct page_info *page = mfn_to_page(page_nr);
2705 if ( unlikely(!mfn_valid(page_nr)) || unlikely(!get_page(page, d)) )
2707 MEM_LOG("Could not get page ref for pfn %lx", page_nr);
2708 return 0;
2711 return 1;
2715 int get_page_type(struct page_info *page, u32 type)
2717 u64 nx, x, y = page->u.inuse.type_info;
2719 ASSERT(!(type & ~PGT_type_mask));
2721 again:
2722 do {
2723 x = y;
2724 nx = x + 1;
2725 if ( unlikely((nx & PGT_count_mask) == 0) )
2727 MEM_LOG("Type count overflow on pfn %lx", page_to_mfn(page));
2728 return 0;
2730 else if ( unlikely((x & PGT_count_mask) == 0) )
2732 if ( (x & PGT_type_mask) != type )
2734 /*
2735 * On type change we check to flush stale TLB entries. This
2736 * may be unnecessary (e.g., page was GDT/LDT) but those
2737 * circumstances should be very rare.
2738 */
2739 cpumask_t mask =
2740 page_get_owner(page)->domain_dirty_cpumask;
2741 tlbflush_filter(mask, page->tlbflush_timestamp);
2743 if ( unlikely(!cpus_empty(mask)) )
2745 perfc_incr(need_flush_tlb_flush);
2746 flush_tlb_mask(mask);
2749 /* We lose existing type, back pointer, and validity. */
2750 nx &= ~(PGT_type_mask | PGT_validated);
2751 nx |= type;
2753 /* No special validation needed for writable pages. */
2754 /* Page tables and GDT/LDT need to be scanned for validity. */
2755 if ( type == PGT_writable_page )
2756 nx |= PGT_validated;
2759 else if ( unlikely((x & PGT_type_mask) != type) )
2761 if ( ((x & PGT_type_mask) != PGT_l2_page_table) ||
2762 (type != PGT_l1_page_table) )
2763 MEM_LOG("Bad type (saw %08lx != exp %08x) "
2764 "for mfn %016lx (pfn %016lx)",
2765 x, type, page_to_mfn(page),
2766 get_gpfn_from_mfn(page_to_mfn(page)));
2767 return 0;
2769 else if ( unlikely(!(x & PGT_validated)) )
2771 /* Someone else is updating validation of this page. Wait... */
2772 while ( (y = page->u.inuse.type_info) == x )
2773 cpu_relax();
2774 goto again;
2777 while ( unlikely((y = cmpxchg_acq(&page->u.inuse.type_info, x, nx)) != x) );
2779 if ( unlikely(!(nx & PGT_validated)) )
2781 /* Try to validate page type; drop the new reference on failure. */
2782 if ( unlikely(!alloc_page_type(page, type)) )
2784 MEM_LOG("Error while validating mfn %lx (pfn %lx) for type %08x"
2785 ": caf=%08x taf=%" PRtype_info,
2786 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
2787 type, page->count_info, page->u.inuse.type_info);
2788 /* Noone else can get a reference. We hold the only ref. */
2789 page->u.inuse.type_info = 0;
2790 return 0;
2793 /* Noone else is updating simultaneously. */
2794 __set_bit(_PGT_validated, &page->u.inuse.type_info);
2797 return 1;
2800 int memory_is_conventional_ram(paddr_t p)
2802 return (efi_mem_type(p) == EFI_CONVENTIONAL_MEMORY);
2806 long
2807 arch_memory_op(int op, XEN_GUEST_HANDLE(void) arg)
2809 struct page_info *page = NULL;
2811 switch (op) {
2812 case XENMEM_add_to_physmap:
2814 struct xen_add_to_physmap xatp;
2815 unsigned long prev_mfn, mfn = 0, gpfn;
2816 struct domain *d;
2818 if (copy_from_guest(&xatp, arg, 1))
2819 return -EFAULT;
2821 if (xatp.domid == DOMID_SELF)
2822 d = rcu_lock_current_domain();
2823 else {
2824 if ((d = rcu_lock_domain_by_id(xatp.domid)) == NULL)
2825 return -ESRCH;
2826 if (!IS_PRIV_FOR(current->domain,d)) {
2827 rcu_lock_domain(d);
2828 return -EPERM;
2832 /* This hypercall is used for VT-i domain only */
2833 if (!is_hvm_domain(d)) {
2834 rcu_unlock_domain(d);
2835 return -ENOSYS;
2838 switch (xatp.space) {
2839 case XENMAPSPACE_shared_info:
2840 if (xatp.idx == 0)
2841 mfn = virt_to_mfn(d->shared_info);
2842 break;
2843 case XENMAPSPACE_grant_table:
2844 spin_lock(&d->grant_table->lock);
2846 if ((xatp.idx >= nr_grant_frames(d->grant_table)) &&
2847 (xatp.idx < max_nr_grant_frames))
2848 gnttab_grow_table(d, xatp.idx + 1);
2850 if (xatp.idx < nr_grant_frames(d->grant_table))
2851 mfn = virt_to_mfn(d->grant_table->shared[xatp.idx]);
2853 spin_unlock(&d->grant_table->lock);
2854 break;
2855 case XENMAPSPACE_mfn:
2857 if ( get_page_from_pagenr(xatp.idx, d) ) {
2858 mfn = xatp.idx;
2859 page = mfn_to_page(mfn);
2861 break;
2863 default:
2864 break;
2867 if (mfn == 0) {
2868 if ( page )
2869 put_page(page);
2870 rcu_unlock_domain(d);
2871 return -EINVAL;
2874 domain_lock(d);
2876 /* Check remapping necessity */
2877 prev_mfn = gmfn_to_mfn(d, xatp.gpfn);
2878 if (mfn == prev_mfn)
2879 goto out;
2881 /* Remove previously mapped page if it was present. */
2882 if (prev_mfn && mfn_valid(prev_mfn)) {
2883 if (is_xen_heap_mfn(prev_mfn))
2884 /* Xen heap frames are simply unhooked from this phys slot. */
2885 guest_physmap_remove_page(d, xatp.gpfn, prev_mfn, 0);
2886 else
2887 /* Normal domain memory is freed, to avoid leaking memory. */
2888 guest_remove_page(d, xatp.gpfn);
2891 /* Unmap from old location, if any. */
2892 gpfn = get_gpfn_from_mfn(mfn);
2893 if (gpfn != INVALID_M2P_ENTRY)
2894 guest_physmap_remove_page(d, gpfn, mfn, 0);
2896 /* Map at new location. */
2897 guest_physmap_add_page(d, xatp.gpfn, mfn, 0);
2899 out:
2900 domain_unlock(d);
2902 if ( page )
2903 put_page(page);
2905 rcu_unlock_domain(d);
2907 break;
2910 case XENMEM_remove_from_physmap:
2912 struct xen_remove_from_physmap xrfp;
2913 unsigned long mfn;
2914 struct domain *d;
2916 if ( copy_from_guest(&xrfp, arg, 1) )
2917 return -EFAULT;
2919 if ( xrfp.domid == DOMID_SELF )
2921 d = rcu_lock_current_domain();
2923 else
2925 if ( (d = rcu_lock_domain_by_id(xrfp.domid)) == NULL )
2926 return -ESRCH;
2927 if ( !IS_PRIV_FOR(current->domain, d) )
2929 rcu_unlock_domain(d);
2930 return -EPERM;
2934 domain_lock(d);
2936 mfn = gmfn_to_mfn(d, xrfp.gpfn);
2938 if ( mfn_valid(mfn) )
2939 guest_physmap_remove_page(d, xrfp.gpfn, mfn, 0);
2941 domain_unlock(d);
2943 rcu_unlock_domain(d);
2945 break;
2949 case XENMEM_machine_memory_map:
2951 struct xen_memory_map memmap;
2952 struct xen_ia64_memmap_info memmap_info;
2953 XEN_GUEST_HANDLE(char) buffer;
2955 if (!IS_PRIV(current->domain))
2956 return -EINVAL;
2957 if (copy_from_guest(&memmap, arg, 1))
2958 return -EFAULT;
2959 if (memmap.nr_entries <
2960 sizeof(memmap_info) + ia64_boot_param->efi_memmap_size)
2961 return -EINVAL;
2963 memmap.nr_entries =
2964 sizeof(memmap_info) + ia64_boot_param->efi_memmap_size;
2965 memset(&memmap_info, 0, sizeof(memmap_info));
2966 memmap_info.efi_memmap_size = ia64_boot_param->efi_memmap_size;
2967 memmap_info.efi_memdesc_size = ia64_boot_param->efi_memdesc_size;
2968 memmap_info.efi_memdesc_version = ia64_boot_param->efi_memdesc_version;
2970 buffer = guest_handle_cast(memmap.buffer, char);
2971 if (copy_to_guest(buffer, (char*)&memmap_info, sizeof(memmap_info)) ||
2972 copy_to_guest_offset(buffer, sizeof(memmap_info),
2973 (char*)__va(ia64_boot_param->efi_memmap),
2974 ia64_boot_param->efi_memmap_size) ||
2975 copy_to_guest(arg, &memmap, 1))
2976 return -EFAULT;
2977 return 0;
2980 default:
2981 return -ENOSYS;
2984 return 0;
2987 int is_iomem_page(unsigned long mfn)
2989 return (!mfn_valid(mfn) || (page_get_owner(mfn_to_page(mfn)) == dom_io));
2992 void xencomm_mark_dirty(unsigned long addr, unsigned int len)
2994 struct domain *d = current->domain;
2995 unsigned long gpfn;
2996 unsigned long end_addr = addr + len;
2998 if (shadow_mode_enabled(d)) {
2999 for (addr &= PAGE_MASK; addr < end_addr; addr += PAGE_SIZE) {
3000 gpfn = get_gpfn_from_mfn(virt_to_mfn(addr));
3001 shadow_mark_page_dirty(d, gpfn);
3006 int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn)
3008 /* STUB to compile */
3009 return -ENOSYS;
3012 int iommu_unmap_page(struct domain *d, unsigned long gfn)
3014 /* STUB to compile */
3015 return -ENOSYS;
3018 /*
3019 * Local variables:
3020 * mode: C
3021 * c-set-style: "BSD"
3022 * c-basic-offset: 4
3023 * tab-width: 4
3024 * indent-tabs-mode: nil
3025 * End:
3026 */