ia64/xen-unstable

view tools/misc/cpuperf/README.txt @ 6403:6e899a3840b2

Rename libxc => libxenctrl and xc.h => xen/xenctrl.h
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
author cl349@firebug.cl.cam.ac.uk
date Wed Aug 24 23:07:29 2005 +0000 (2005-08-24)
parents cfee4c4a8ed6
children
line source
1 Usage
2 =====
4 Use either xen-cpuperf, cpuperf-perfcntr as appropriate to the system
5 in use.
7 To write:
9 cpuperf -E <escr> -C <cccr>
11 optional: all numbers in base 10 unless specified
13 -d Debug mode
14 -c <cpu> CPU number
15 -t <thread> ESCR thread bits - default is 12 (Thread 0 all rings)
16 bit 0: Thread 1 in rings 1,2,3
17 bit 1: Thread 1 in ring 0
18 bit 2: Thread 0 in rings 1,2,3
19 bit 3: Thread 0 in ring 0
20 -e <eventsel> Event selection number
21 -m <eventmask> Event mask bits
22 -T <value> ESCR tag value
23 -k Sets CCCR 'compare' bit
24 -n Sets CCCR 'complement' bit
25 -g Sets CCCR 'edge' bit
26 -P <bit> Set the specified bit in MSR_P4_PEBS_ENABLE
27 -V <bit> Set the specified bit in MSR_P4_PEBS_MATRIX_VERT
28 (-V and -P may be used multiple times to set multiple bits.)
30 To read:
32 cpuperf -r
34 optional: all numbers in base 10 unless specified
36 -c <cpu> CPU number
38 <cccr> values:
40 BPU_CCCR0
41 BPU_CCCR1
42 BPU_CCCR2
43 BPU_CCCR3
44 MS_CCCR0
45 MS_CCCR1
46 MS_CCCR2
47 MS_CCCR3
48 FLAME_CCCR0
49 FLAME_CCCR1
50 FLAME_CCCR2
51 FLAME_CCCR3
52 IQ_CCCR0
53 IQ_CCCR1
54 IQ_CCCR2
55 IQ_CCCR3
56 IQ_CCCR4
57 IQ_CCCR5
58 NONE - do not program any CCCR, used when setting up an ESCR for tagging
60 <escr> values:
62 BSU_ESCR0
63 BSU_ESCR1
64 FSB_ESCR0
65 FSB_ESCR1
66 MOB_ESCR0
67 MOB_ESCR1
68 PMH_ESCR0
69 PMH_ESCR1
70 BPU_ESCR0
71 BPU_ESCR1
72 IS_ESCR0
73 IS_ESCR1
74 ITLB_ESCR0
75 ITLB_ESCR1
76 IX_ESCR0
77 IX_ESCR1
78 MS_ESCR0
79 MS_ESCR1
80 TBPU_ESCR0
81 TBPU_ESCR1
82 TC_ESCR0
83 TC_ESCR1
84 FIRM_ESCR0
85 FIRM_ESCR1
86 FLAME_ESCR0
87 FLAME_ESCR1
88 DAC_ESCR0
89 DAC_ESCR1
90 SAAT_ESCR0
91 SAAT_ESCR1
92 U2L_ESCR0
93 U2L_ESCR1
94 CRU_ESCR0
95 CRU_ESCR1
96 CRU_ESCR2
97 CRU_ESCR3
98 CRU_ESCR4
99 CRU_ESCR5
100 IQ_ESCR0
101 IQ_ESCR1
102 RAT_ESCR0
103 RAT_ESCR1
104 SSU_ESCR0
105 SSU_ESCR1
106 ALF_ESCR0
107 ALF_ESCR1
110 Example configurations
111 ======================
113 Note than in most cases there is a choice of ESCRs and CCCRs for
114 each metric although not all combinations are allowed. Each ESCR and
115 counter/CCCR can be used only once.
117 Mispredicted branches retired
118 =============================
120 cpuperf -E CRU_ESCR0 -C IQ_CCCR0 -e 3 -m 1
121 cpuperf -E CRU_ESCR0 -C IQ_CCCR1 -e 3 -m 1
122 cpuperf -E CRU_ESCR0 -C IQ_CCCR4 -e 3 -m 1
123 cpuperf -E CRU_ESCR1 -C IQ_CCCR2 -e 3 -m 1
124 cpuperf -E CRU_ESCR1 -C IQ_CCCR3 -e 3 -m 1
125 cpuperf -E CRU_ESCR1 -C IQ_CCCR5 -e 3 -m 1
127 Tracecache misses
128 =================
130 cpuperf -E BPU_ESCR0 -C BPU_CCCR0 -e 3 -m 1
131 cpuperf -E BPU_ESCR0 -C BPU_CCCR1 -e 3 -m 1
132 cpuperf -E BPU_ESCR1 -C BPU_CCCR2 -e 3 -m 1
133 cpuperf -E BPU_ESCR1 -C BPU_CCCR3 -e 3 -m 1
135 I-TLB
136 =====
138 cpuperf -E ITLB_ESCR0 -C BPU_CCCR0 -e 24
139 cpuperf -E ITLB_ESCR0 -C BPU_CCCR1 -e 24
140 cpuperf -E ITLB_ESCR1 -C BPU_CCCR2 -e 24
141 cpuperf -E ITLB_ESCR1 -C BPU_CCCR3 -e 24
143 -m <n> : bit 0 count HITS, bit 1 MISSES, bit 2 uncacheable hit
145 e.g. all ITLB misses -m 2
147 Load replays
148 ============
150 cpuperf -E MOB_ESCR0 -C BPU_CCCR0 -e 3
151 cpuperf -E MOB_ESCR0 -C BPU_CCCR1 -e 3
152 cpuperf -E MOB_ESCR1 -C BPU_CCCR2 -e 3
153 cpuperf -E MOB_ESCR1 -C BPU_CCCR3 -e 3
155 -m <n> : bit mask, replay due to...
156 1: unknown store address
157 3: unknown store data
158 4: partially overlapped data access between LD/ST
159 5: unaligned address between LD/ST
161 Page walks
162 ==========
164 cpuperf -E PMH_ESCR0 -C BPU_CCCR0 -e 1
165 cpuperf -E PMH_ESCR0 -C BPU_CCCR1 -e 1
166 cpuperf -E PMH_ESCR1 -C BPU_CCCR2 -e 1
167 cpuperf -E PMH_ESCR1 -C BPU_CCCR3 -e 1
169 -m <n> : bit 0 counts walks for a D-TLB miss, bit 1 for I-TLB miss
171 L2/L3 cache accesses
172 ====================
174 cpuperf -E BSU_ESCR0 -C BPU_CCCR0 -e 12
175 cpuperf -E BSU_ESCR0 -C BPU_CCCR1 -e 12
176 cpuperf -E BSU_ESCR1 -C BPU_CCCR2 -e 12
177 cpuperf -E BSU_ESCR1 -C BPU_CCCR3 -e 12
179 -m <n> : where the bit mask is:
180 0: Read L2 HITS Shared
181 1: Read L2 HITS Exclusive
182 2: Read L2 HITS Modified
183 3: Read L3 HITS Shared
184 4: Read L3 HITS Exclusive
185 5: Read L3 HITS Modified
186 8: Read L2 MISS
187 9: Read L3 MISS
188 10: Write L2 MISS
190 Front side bus activity
191 =======================
193 cpuperf -E FSB_ESCR0 -C BPU_CCCR0 -e 23 -k -g
194 cpuperf -E FSB_ESCR0 -C BPU_CCCR1 -e 23 -k -g
195 cpuperf -E FSB_ESCR1 -C BPU_CCCR2 -e 23 -k -g
196 cpuperf -E FSB_ESCR1 -C BPU_CCCR3 -e 23 -k -g
198 -m <n> : where the bit mask is for bus events:
199 0: DRDY_DRV Processor drives bus
200 1: DRDY_OWN Processor reads bus
201 2: DRDY_OTHER Data on bus not being sampled by processor
202 3: DBSY_DRV Processor reserves bus for driving
203 4: DBSY_OWN Other entity reserves bus for sending to processor
204 5: DBSY_OTHER Other entity reserves bus for sending elsewhere
206 e.g. -m 3 to get cycles bus actually in use.
208 Pipeline clear (entire)
209 =======================
211 cpuperf -E CRU_ESCR2 -C IQ_CCCR0 -e 2
212 cpuperf -E CRU_ESCR2 -C IQ_CCCR1 -e 2
213 cpuperf -E CRU_ESCR2 -C IQ_CCCR4 -e 2
214 cpuperf -E CRU_ESCR3 -C IQ_CCCR2 -e 2
215 cpuperf -E CRU_ESCR3 -C IQ_CCCR3 -e 2
216 cpuperf -E CRU_ESCR3 -C IQ_CCCR5 -e 2
218 -m <n> : bit mask:
219 0: counts a portion of cycles while clear (use -g for edge trigger)
220 1: counts each time machine clears for memory ordering issues
221 2: counts each time machine clears for self modifying code
223 Instructions retired
224 ====================
226 cpuperf -E CRU_ESCR0 -C IQ_CCCR0 -e 2
227 cpuperf -E CRU_ESCR0 -C IQ_CCCR1 -e 2
228 cpuperf -E CRU_ESCR0 -C IQ_CCCR4 -e 2
229 cpuperf -E CRU_ESCR1 -C IQ_CCCR2 -e 2
230 cpuperf -E CRU_ESCR1 -C IQ_CCCR3 -e 2
231 cpuperf -E CRU_ESCR1 -C IQ_CCCR5 -e 2
233 -m <n> : bit mask:
234 0: counts non-bogus, not tagged instructions
235 1: counts non-bogus, tagged instructions
236 2: counts bogus, not tagged instructions
237 3: counts bogus, tagged instructions
239 e.g. -m 3 to count legit retirements
241 Uops retired
242 ============
244 cpuperf -E CRU_ESCR0 -C IQ_CCCR0 -e 1
245 cpuperf -E CRU_ESCR0 -C IQ_CCCR1 -e 1
246 cpuperf -E CRU_ESCR0 -C IQ_CCCR4 -e 1
247 cpuperf -E CRU_ESCR1 -C IQ_CCCR2 -e 1
248 cpuperf -E CRU_ESCR1 -C IQ_CCCR3 -e 1
249 cpuperf -E CRU_ESCR1 -C IQ_CCCR5 -e 1
251 -m <n> : bit mask:
252 0: Non-bogus
253 1: Bogus
255 x87 FP uops
256 ===========
258 cpuperf -E FIRM_ESCR0 -C FLAME_CCCR0 -e 4 -m 32768
259 cpuperf -E FIRM_ESCR0 -C FLAME_CCCR1 -e 4 -m 32768
260 cpuperf -E FIRM_ESCR1 -C FLAME_CCCR2 -e 4 -m 32768
261 cpuperf -E FIRM_ESCR1 -C FLAME_CCCR3 -e 4 -m 32768
263 Replay tagging mechanism
264 ========================
266 Counts retirement of uops tagged with the replay tagging mechanism
268 cpuperf -E CRU_ESCR2 -C IQ_CCCR0 -e 9
269 cpuperf -E CRU_ESCR2 -C IQ_CCCR1 -e 9
270 cpuperf -E CRU_ESCR2 -C IQ_CCCR4 -e 9
271 cpuperf -E CRU_ESCR3 -C IQ_CCCR2 -e 9
272 cpuperf -E CRU_ESCR3 -C IQ_CCCR3 -e 9
273 cpuperf -E CRU_ESCR3 -C IQ_CCCR5 -e 9
275 -m <n> : bit mask:
276 0: Non-bogus (set this bit for all events listed below)
277 1: Bogus
279 Set replay tagging mechanism bits with -P and -V:
281 L1 cache load miss retired: -P 0 -P 24 -P 25 -V 0
282 L2 cache load miss retired: -P 1 -P 24 -P 25 -V 0 (read manual)
283 DTLB load miss retired: -P 2 -P 24 -P 25 -V 0
284 DTLB store miss retired: -P 2 -P 24 -P 25 -V 1
285 DTLB all miss retired: -P 2 -P 24 -P 25 -V 0 -V 1
287 e.g. to count all DTLB misses
289 cpuperf -E CRU_ESCR2 -C IQ_CCCR0 -e 9 -m 1 P 2 -P 24 -P 25 -V 0 -V 1
291 Front end event
292 ===============
294 To count tagged uops:
296 cpuperf -E CRU_ESCR2 -C IQ_CCCR0 -e 8
297 cpuperf -E CRU_ESCR2 -C IQ_CCCR1 -e 8
298 cpuperf -E CRU_ESCR2 -C IQ_CCCR4 -e 8
299 cpuperf -E CRU_ESCR3 -C IQ_CCCR2 -e 8
300 cpuperf -E CRU_ESCR3 -C IQ_CCCR3 -e 8
301 cpuperf -E CRU_ESCR3 -C IQ_CCCR5 -e 8
303 -m <n> : bit 0 for non-bogus uops, bit 1 for bogus uops
305 Must have another ESCR programmed to tag uops as required
307 cpuperf -E RAT_ESCR0 -C NONE -e 2
308 cpuperf -E RAT_ESCR1 -C NONE -e 2
310 -m <n> : bit 1 for LOADs, bit 2 for STOREs
312 An example set of counters
313 ===========================
315 # instructions retired
316 cpuperf -E CRU_ESCR0 -C IQ_CCCR0 -e 2 -m 3
318 # trace cache misses
319 cpuperf -E BPU_ESCR0 -C BPU_CCCR0 -e 3 -m 1
321 # L1 D cache misses (load misses retired)
322 cpuperf -E CRU_ESCR2 -C IQ_CCCR1 -e 9 -m 1 -P 0 -P 24 -P 25 -V 0
324 # L2 misses (load and store)
325 cpuperf -E BSU_ESCR0 -C BPU_CCCR1 -e 12 -m 1280
327 # I-TLB misses
328 cpuperf -E ITLB_ESCR1 -C BPU_CCCR2 -e 24 -m 2
330 # D-TLB misses (as PT walks)
331 cpuperf -E PMH_ESCR1 -C BPU_CCCR3 -e 1 -m 1
333 # Other 'bonus' counters would be:
334 # number of loads executed - need both command lines
335 cpuperf -E RAT_ESCR0 -C NONE -e 2 -m 2
336 cpuperf -E CRU_ESCR3 -C IQ_CCCR3 -e 8 -m 3
338 # number of mispredicted branches
339 cpuperf -E CRU_ESCR1 -C IQ_CCCR2 -e 3 -m 1
341 # x87 FP uOps
342 cpuperf -E FIRM_ESCR0 -C FLAME_CCCR0 -e 4 -m 32768
344 The above has counter assignments
346 0 Trace cache misses
347 1 L2 Misses
348 2 I-TLB misses
349 3 D-TLB misses
350 4
351 5
352 6
353 7
354 8 x87 FP uOps
355 9
356 10
357 11
358 12 Instructions retired
359 13 L1 D cache misses
360 14 Mispredicted branches
361 15 Loads executed
362 16
363 17
365 Counting instructions retired on each logical CPU
366 =================================================
368 cpuperf -E CRU_ESCR0 -C IQ_CCCR0 -e 2 -m 3 -t 12
369 cpuperf -E CRU_ESCR1 -C IQ_CCCR2 -e 2 -m 3 -t 3
371 Cannot count mispred branches as well due to CRU_ESCR1 use.