Measures CPI (cycles per instruction) for an application or a time interval.
Syntax:
-------
cpi <-? | -??> (** HELP MODE)
<free_options> <-s #> <-c command> (** COMMAND MODE)
<free_options> (** MANUAL START/STOP MODE)
<free_options> <-s #> <-r #> (** AUTO START/STOP MODE)
free_options: <-noai> <-t sec> <-f fn> <-ds> <-ts>
- Any combination of these options can be used in all modes.
Where:
------
********** FREE OPTIONS **********
-noai Causes cpi not to measure CPU utilization.
* idle, busy and intr values are not displayed.
* cpi_wb (CPI while busy) is not calculated.
-t #sec Specifies a sampling interval (in seconds).
cpi will calculate and display CPI for each interval.
* If you set the interval too short you will affect the very
same CPI you are trying to measure.
* If you DO NOT specify the -t option, CPI will be calculated
and displayed either:
- When CPI measurement is stopped (via ENTER or Ctrl-C).
- When the user command being measured completes.
- When the specified run time has completed.
-f fn Specifies a filename to write results to.
cpi will write all output to the STDOUT and to the output file.
* If you DO NOT specify the -f option, all output is to STDOUT.
-ds Display system summary CPI only. Do not display per-processor CPI.
* The resulting output is more compact - one line per sample instead
of six (6) when displaying per-processor CPI.
* If you DO NOT specify the -ds option, the results include the
-ts Timestamp each CPI sample.
* Useful when trying to correlate CPI output to other events which
have time associated with them.
* If you DO NOT specify the -ts option, the results do not include
the time at which they were calculated.
********** COMMAND/START/STOP OPTIONS **********
-s #sec Causes cpi to automatically start measuring CPU utilization
after approximately #sec seconds.
* You WILL NOT be prompted for when to start measuring. Instead,
measuring will start automatically after a delay of about
#sec seconds.
* If you specify 0 (zero), measuring starts immediately.
* If you use this option your application should be started and
warmed up by the time measuring is started.
* If you DO NOT specify the -s option, you WILL BE prompted
for when to start the measurement interval.
-r #sec Run (measure) for approximately #sec seconds then stop.
* You WILL NOT be prompted for when to stop measuring. Instead,
measuring will stop automatically after a delay of about
#sec seconds from the time measuring was started.
* If you specify 0 (zero), measuring is stopped immediately after
being started. Kind of useless.
* If you use this option your application should complete the
scenario you want to measure in approximately the amount of
time you specify with this option.
* NOT allowed with -c.
* If you DO NOT specify the -r option, you will be prompted for
when to stop measuring.
-c Causes cpi to execute the requested command and measure CPI, and
optionally, CPU utilization while the command executes.
* If specified, -c must be the last or only option entered.
* -c and -r are mutually exclusive options.
- The entire command execution is measured. You can't specify
a run time.
* - If the command contains blanks then enclose the entire
command in double quotes. For example:
-c "java GraphSuite drawLine native=p6 speed=500"
- If the command contains double quotes then enclose the entire
command in double quotes *AND* use the \" escape sequence
(backslash-double quote) for the inner double quotes.
For example, if your command is:
java Translate "Hello World!" lang=ES
Your -c option would look like:
-c "java Translate \"Hello World!\" lang=ES"
- If you want to redirect your application's output then you
*MUST* invoke it from a script file.
Notes:
------
* At this time you cannot specify -r with -c.
* idle, busy and intr are not displayed if the -noai option is specified.
* If you specify a command then the measured CPI will include application
load and termination.
* Every now and then you may see an incomplete display line at the end of
the measurement interval (either because automatically stopped or because
stopped via Ctrl-C).
Valid Commands:
---------------
1) cpi
* Prompts user for when to start measurement interval.
* Prompts user for when to end measurement interval.
* Measures CPI and CPU utilization for the interval.
2) cpi -s 10
* Automatically starts measurement interval in about 10 seconds.
* Prompts user for when to end measurement interval.
* Measures CPI and CPU utilization for the interval.
3) cpi -s 10 -t 2
* Automatically starts measurement interval in about 10 seconds.
* Prompts user for when to end measurement interval.
* Measures and displays CPI and CPU utilization every 2 seconds.
4) cpi -s 10 -r 20
* Automatically start the measurement interval in about 10 seconds.
* Runs measurement interval for about 20 seconds and automatically
stops it.
* Measures CPI and CPU utilization for the interval.
* There are no user prompts.
5) cpi -s 10 -r 20 -t 2
* Automatically start the measurement interval in about 10 seconds.
* Runs measurement interval for about 20 seconds and automatically
stops it.
* Measures and displays CPI and CPU utilization every 2 seconds.
* There are no user prompts.
6) cpi -r 20
* Prompts user for when to start measurement interval.
* Runs measurement interval for about 20 seconds after being started
and automatically stops it.
* Measures CPI and CPU utilization for the interval.
7) cpi -s 0 -r 20
* Immediately start the measurement interval (in about 0 seconds).
* Runs measurement interval for about 20 seconds and automatically
stops it.
* Measures CPI and CPU utilization for the interval.
* There are no user prompts.
8) cpi -s 0 -r 20 -t 2
* Immediately start the measurement interval (in about 0 seconds).
* Runs measurement interval for about 20 seconds and automatically
stops it.
* Measures and displays CPI and CPU utilization every 2 seconds.
* There are no user prompts.
9) cpi -c "java java_app"
* Runs command 'java java_app' and measures CPI and CPU utilization
for the duration of the command.
10) cpi -t 2 -c "java java_app"
* Runs command 'java java_app' and measures CPI and CPU utilization
every 2 seconds for the duration of the command.
11) cpi -s 10 -c "java java_app"
* Automatically starts the command 'java java_app', and starts the
measurement interval in about 10 seconds.
* Runs command and measures CPI and CPU utilization for the duration
of the command.
12) cpi -s 10 -t 2 -c "java java_app"
* Automatically starts the command 'java java_app', and starts the
measurement interval in about 10 seconds.
* Measures and displays CPI and CPU utilization every 2 seconds for
the duration of the command.
Sample Output:
--------------
1) System at almost 100% idle:
System CPU0 CPU1 CPU2 CPU3
cpi 943.27 909.18 728.15 1426.35 937.87
cpi_wb 7.55 8.71 5.90 8.39 7.92
idle% 99.20 99.04 99.19 99.41 99.16
busy% 0.71 0.80 0.75 0.53 0.75
intr% 0.09 0.16 0.06 0.05 0.09
2) System at almost 100% busy:
System CPU0 CPU1 CPU2 CPU3
cpi 1.26 1.32 1.22 1.31 1.21
cpi_wb 1.25 1.30 1.20 1.29 1.20
idle% 1.10 1.18 1.28 0.89 1.04
busy% 98.80 98.59 98.65 99.06 98.90
intr% 0.10 0.22 0.06 0.05 0.06
3) System at about 50% busy:
System CPU0 CPU1 CPU2 CPU3
cpi 2.96 3.47 2.67 2.77 3.07
cpi_wb 1.48 2.33 1.02 1.16 1.60
idle% 50.10 32.71 61.88 58.08 47.73
busy% 48.94 65.90 37.71 41.12 51.04
intr% 0.96 1.39 0.41 0.80 1.23
4) System at almost 100% busy, system CPI summary:
cpi: 1.23 cpi_wb: 1.21 idle: 1.26% busy: 98.66% intr: 0.08%
cpi: 1.25 cpi_wb: 1.24 idle: 1.36% busy: 98.56% intr: 0.08%
cpi: 1.21 cpi_wb: 1.20 idle: 1.22% busy: 98.71% intr: 0.07%
cpi: 1.24 cpi_wb: 1.23 idle: 1.06% busy: 98.86% intr: 0.08%
Sample Output Description:
--------------------------
* All values are measured for either:
- The entire measurement interval.
- One sampling interval.
* 'total_busy' is a computed value indicating the total % busy.
- For a CPU, 'total_busy' is the total time the CPU is not idle.
- For the system, 'total_busy' is the total time the system is not idle.
- 'total_busy' is %busy + %intr (or 100 - %idle).
* 'System' values are averages over all CPUs.
* 'CPUx' values are per-cpu.
* 'cpi' is measured Cycles-Per-Instructions (cycles / instructions).
* 'cpi_wb' is Cycles-Per-Instruction While Busy (cpi * total_busy / 100).
* 'idle', 'busy' and 'intr' are the % of time the system was either
idle, busy or processing interrupts.
* Calculations:
- Per-CPU
cpi: interval_cycles / interval_instr
cpi_wb: cpu_cpi * total_busy / 100
idle%: interval_idle_cycles / interval_cycles * 100
busy%: interval_busy_cycles / interval_cycles * 100
intr%: interval_intr_cycles / interval_cycles * 100
- System
cpi: sum(cpu_cycles) / sum(cpu_instr)
cpi_wb: sum((cpu_cycles * total_busy / 100)) / sum(instr)
idle%: sum(cpu_idle_cycles) / sum(cpu_cycles) * 100
busy%: sum(cpu_busy_cycles) / sum(cpu_cycles) * 100
intr%: sum(cpu_intr_cycles) / sum(cpu_cycles) * 100
Read and write P4/P6/AMD64/EM64T MSRs (Model-Specific Registers).
Syntax:
msr -r [msr_num | msr_name] <-c cpu_num> (READ)
msr -w [msr_num | msr_name] value <-c cpu_num> (WRITE)
Where:
-r Read MSR.
-w Write 'value' to MSR.
-c Specifies processor(s) to read from or write to.
- Valid values are: 'all' or a processor number.
- Default is 'all' processors.
msr_num is any valid msr number, in decimal or hexadecimal.
msr_name is one of the following names:
MSR Corresponding MSR number
name P6 P4 EM64T AMD64
---- ------ ------ ---------- ----------
ctr0 0xC1 0x300 0x300 0xC0010004
ctr1 0xC2 0x301 0x301 0xC0010005
ctr2 -- 0x302 0x302 0xC0010006
ctr3 -- 0x303 0x303 0xC0010007
...
ctr17 0x311 0x311 --
evtsel0 0x186 -- -- 0xC0010000
evtsel1 0x187 -- -- 0xC0010001
evtsel2 -- -- -- 0xC0010002
evtsel3 -- -- -- 0xC0010003
cccr0 -- 0x360 0x360 --
...
cccr17 -- 0x371 0x371 --
tsc 0x10 0x10 0x10 0x10
perf_status -- 0x198 0x198 --
perf_ctl -- 0x199 0x199 --
efer -- -- 0xC0000080 0xC0000080
star -- -- 0xC0000081 0xC0000081
lstar -- -- 0xC0000082 0xC0000082
cstar -- -- 0xC0000083 0xC0000083
sfmask -- -- 0xC0000084 0xC0000084
fsbase -- -- 0xC0000100 0xC0000100
gsbase -- -- 0xC0000101 0xC0000101
kgsbase -- -- 0xC0000102 0xC0000102
Notes:
* Hexadecimal numbers must be prefixed with '0x'.
* 'value' is a decimal or hexadecimal number.
- If larger than 64-bits it will be truncated to 64-bits.
- If smaller than 64-bits it will be zero-extended to 64-bits.
* Neither 'msr_num' nor 'value' are checked for validity.
- Be *VERY* careful with the values you write and to which
MSR you write it to. You have the ability to cause major
problems. Consider yourself warned!
- If 'msr_num' is invalid then nothing is read nor written.
- I would stay away from writing to the EFER, LSTAR and similar
MSRs used by the OS. You are asking for *BIG* trouble if you
modify any of those registers.
* For performance counters only the low order 40 (Intel) or 48 (AMD)
bits are written and/or read.
* All other values displayed (on reads) are the raw 64-bit values.
Some MSRs are only 32-bits and others are 64-bits. You decide how
to interpret the values.
Examples:
1) Read TSC on all processors:
- msr -r 16
- msr -r 0x10
- msr -r tsc
- msr -r 16 -c all
- msr -r 0x10 -c all
- msr -r tsc -c all
2) Read TSC on processor 1:
- msr -r 0x10 -c 1
3) Reset TSC on all processors:
- msr -w 0x10 0
- msr -w tsc 0
4) Read P6 ctr0 on all processors:
- msr -r 0xc1
- msr -r 0x193
- msr -r ctr0
5) Set P6 EvtSel0 (ctr0) to count INSTR_RETIRED:
- msr -w 0x186 0x004300c0
- msr -w evtsel0 0x004300c0
6) Read P4/EM64T ctr12 on all processors:
- msr -r 0x30c
- msr -r 780
- msr -r ctr12
7) Read Opteron/Athlon64 ctr3 on all processors:
- msr -r 0xc0010007
- msr -r 201330695
- msr -r ctr3
Manipulate hardware performance counter events on P4/P6/AMD64/EM64T.
mpevt -s [-n name | -i id] (START)
-r [-n name_list | -i id_list] (READ)
-e [-n name_list | -i id_list] (END)
-eall (END ALL)
-l<d><d> (LIST SUPPORTED)
-la<d><d> (LIST ACTIVE)
-lc (LIST EVENT/CTRS)
-q [-n name_list | -i id_list] <-t sec> (QUERY)
-qf [-n name_list | -i id_list] <-f fn> <-t sec> (QUERY TO FILE)
-qso[-n name_list | -i id_list] <-t sec> (QUERY TO STDOUT)
Where:
------
***** Command verbs *****
-s
Stars the requested performance counter event on all processors.
* Events must be started one at a time - you can't specify a name list.
* Once you're done using the event you should 'end' it (-e).
* Different events are valid for different processor families.
* To see a list of valid events enter: "mpevt -l"
-r
Read counters associated with event. Output to STDOUT.
-e
Ends (terminates) event and releases resources associated with it.
-eall
Ends (terminates) ALL (terminates) all active events.
-l
Lists supported performance counter events for processor family.
Display includes event ids and event names.
-ld
Lists supported performance counter events for processor family.
Display includes event ids, event names and event descriptions.
-ldd
Same as "-ld" but with additional detail, including the hardware
event name and unit/event mask (as listed in the Intel/AMD docs),
and the counters on which the event can be counted.
-la
Lists active (i.e., started) performance counter events.
Display includes event ids and event names.
-lad
Lists active (i.e., started) performance counter events.
Display includes event ids, event names and event descriptions.
-ladd
Same as "-lad" but with additional detail, including the counter
on which the event is being counted, and the counter control MSRs.
-lc
Lists active performance counter events, and their associated
performance counters, by processor.
-q
Read counter associated with event.
* Read time interval is 1 second unless -t is specified.
-qf
Read counter associated with event continually.
* Output to 'evt.out' file, unless -f specified.
* Read time interval is 1 second unless -t is specified.
-qso
Continually Read counter associated with event.
* Output to STDOUT.
* Read time interval is 1 second unless -t is specified.
***** Command modifiers *****
-n name | name_list
Specifies events by name.
* 'name' is a single event name.
- Required with -s.
- Allowed with -r, -e, -q, -qf and -qso.
* 'name_list' is a comma-separated list of event names.
- Allowed with -r, -e, -q, -qf and -qso.
- Not allowed with -s.
* The pseudo-name "active" is a synonym for all active events.
- It can be used with any command that allows a 'name_list'.
* To see a list of valid event names enter: "mpevt -l"
* Event names *DO NOT* have to be specified in uppercase.
- INSTR, iNsTr, Instr, instr, InStR, etc., all specify the same event.
-i id | id_list
Specifies events by ID.
* 'id' is a single event id.
- Required with -s.
- Allowed with -r, -e, -q, -qf and -qso.
* 'id_list' is a comma-separated list of event ids.
- Allowed with -r, -e, -q, -qf and -qso.
- Not allowed with -s.
* The pseudo-id "active" is a synonym for all active events.
- It can be used with any command that allows a 'id_list'.
* To see a list of valid event ids enter: "mpevt -l"
-f fn
Specifies filename where read values will be written to.
-t sec
Specifies interval time, in seconds, between event reads.
Examples:
---------
1) mpevt -s -n instr
mpevt -s -n INSTR
mpevt -s -n InStR
mpevt -s -i 102
* Sets up the correct performance counter to count instructions retired.
* It does not require you to know how to set up the counters.
2) mpevt -r -n instr
mpevt -r -i 102
- Reads current counter values(s) for the INSTR event (if it was started).
3) mpevt -r -n instr,calls
mpevt -r -i 102,108
* Reads current counter values(s) for the INSTR and CALLS events (if they
were started and active).
4) mpevt -q -n instr,calls -t 10
mpevt -q -i 102,108 -t 10
* Reads counter values(s) for the INSTR and CALLS events (if they were
started and active) every 10 seconds, until stopped via Ctrl-C.
5) mpevt -q -n active -t 5 -f results.out
mpevt -q -i active -t 5 -f results.out
* Reads counter values(s) for all active events every 5 seconds, until
stopped via Ctrl-C. Writes results to file 'results.out'.
6) mpevt -ld
* Lists supported events, if any, and displays event descriptions.
7) mpevt -lad
* Lists currently active events, if any, and displays event descriptions.
8) mpevt -e -n INSTR
mpevt -e -i 102
* Stops and ends counting the INSTR event (if it was started).
9) mpevt -eall
* Stops and ends all currently active events.
Notes:
------
1) mpevt **DOES NOT** work on P5 (Pentium) machines.
2) The list of supported events will be different on different architectures.
3) If you specify events that are not active they will be ignored.
4) If you specify options that are not applicable to the command they
will be ignored. For example, if you specify the -f and/or -t option with
-s, both will be ignored.
5) If you specify conflicting options then the right-most option is used.
For example, if you specify the -n and -i option with -s, then the value
given with -i will be used.
Manipulate P6/P4/AMD64/EM64T hardware performance counters.
mpcnt -s ctr cccr escr <msr value <msr value>> (START P4/EM64T)
-s ctr evtsel (START P6/AMD64)
-p ctr <ctr ...> (PAUSE (STOP))
-u ctr <ctr ...> (RESUME)
-z ctr <ctr ...> (ZERO (RESET))
-r ctr <ctr ...> (READ)
-q ctr <ctr ...> (READ INTERACTIVE)
-qf ctr <ctr ...> <-f fn> <-t sec> (READ TO FILE)
-qso ctr <ctr ...> <-t sec> (READ TO STDOUT)
<-c target_cpu> (Target CPU)
Where:
------
***** Command verbs *****
-s
-start
Stars the requested performance counter on requested processors
- ctr specifies a counter number or name
- P6/AMD64:
* evtsel specifies the 32-bit value of the Event Select
register associated with the counter. The actual Event
Select registger number is implied from the counter number.
- P4/EM64T:
* cccr specifies the 32-bit value of the CCCR register
associated with the counter.
* escr specifies the 32-bit value of the ESCR register
associated with the counter/CCCR value.
* msr/value is an MSR number/value pair required to set up the
counter. Neither one is checked for validity.
-p
-pause
Pauses (stops) the requested counters. The counters stop counting.
The counter contents is left unchanged.
-u
-resume
Resumes (unpauses?) the requested counters. The counters resume
counting from where they left off.
-z
-reset
Resets (clears) the requested counters. The counter is set
to zero. If counting is not stopped the counter continues counting.
-r
-read
Read and display contents of requested counters.
-q
-query
Read and display contents of requestes counter and go into input mode.
- From input mode you can Query (again) or End the current event.
- Output to STDOUT.
-qf
-queryf
Read contents of requested counters continually.
- Output to 'ctr.out' file, unless -f option specified.
- Read time interval is 1 second unless -t is specified.
-qso
-queryso
Read contents of requested counters continually.
- Output to STDOUT.
- Read time interval is 1 second unless -t is specified.
***** Command modifiers *****
-c target_cpu
-cpu target_cpu
Specifies the processor(s) to which the requested command is sent.
Valid values for target_cpu are:
- The word 'all'. This is the default.
- A cpu number
- A cpu name in the form CPU#
- The word 'LP0'. This means the Logical 0 (even) CPUs in a
HyperThreaded physical CPU.
- The word 'LP1'. This means the Logical 1 (odd) CPUs in a
HyperThreaded physical CPU.
-f fn
-file fn
Specifies filename where read values will be written to.
-t sec
-time sec
Specifies time interval time, in seconds, between event reads.
Notes:
------
1) It is *YOUR RESPONSIBILITY* to know and understand what you are doing.
If you're not sure, don't do it and take a look at the processor documentation.
2) mpcnt **DOES NOT** work on P5 (Pentium) machines.
3) On P6 systems you must remember that both counters are started/stopped
using PerfEvtSel0. That means if you want the counters to count you
must start some event on counter 0.
4) Given number 3 above, you can't stop a single counter on a P6 system.
When you say "stop/pause a counter" you are really saying "stop/pause
*BOTH* counters."
5) mpcnt accepts the correct form of the start command based on the
machine you are running.
6) If you specify options that are not applicable to the command they
will be ignored. For example, if you specify the -f and/or -t option with
-s, both will be ignored.
7) If you specify conflicting/duplicate options then the right-most option
is used. For example, if you specify the -f option twice, then the value
given with the rightmost -f will be used.
Examples:
---------
1) mpcnt -s ctr0 0x00430079
- On a P6 processor, set ctr0 to count CPU_CYCLES_UNHALTED.
2) mpcnt -s ctr1 0x004300c0
- On a P6 processor, set ctr1 to count INSTR_RETIRED.
3) mpcnt -s ctr1 0x01234567
- On a P6 processor, set ctr0 to count whatever 0x01234567 means.
4) mpcnt -s ctr12 0x00039000 0x0400060c
- On a P4/EM64T processor, set ctr12 to count non-bogus Instr_retired.
5) mpcnt -s 3 0x00037000 0x3000040c
- On a P4/EM64T processor, set ctr3 to count ITLB_misses.
6) mpcnt -s 1 0x004300c0
- On an AMD64 processor, set ctr1 to count RETIRED_X86_INSTR.
7) mpcnt -s 3 0x004300c1
- On an AMD64 processor, set ctr1 to count RETIRED_UOPS.
Skew processor clocks (TSC register) on supported SMP systems.
Syntax:
skew [-s [#sec]] [-q]
Where:
-s Skew the clocks.
* If #sec is specified then clocks are skewed #secs from
each other.
* If #sec is not specified then clocks are skewed 60 secs
(1 min) from each other.
* The highest numbered processor will have the largest skew
(relative to processor 0). Processor 0 will have the smallest
skew.
* Clock skew is displayed at completion of the command.
-q Query clock skew.
* Displays current value of TSC for all processors
and the difference from processor 0's TSC and from the TSC
from the next numerically lower processor number.
Notes:
* Skewing is accomplished by resetting (to 0) the TSC register.
Any code that relies on the TSC not going backwards may have
problems.
* The command won't complete until all clocks are skewed. So, for
example, if you request clocks to be skewed by 2 minutes on an
8-way SMP machine, the command will run for 14 minutes.
* Skew display data fields:
- TSC:
Current value of the TSC
- Delta (CPU0):
Difference in values between this processor and processor 0.
The difference is expressed as an absolute value.
- Delta (previous CPU):
Difference in values between this processor and previous
(numerically lower) processor. The difference is expressed
as an absolute value.
- Time:
Approximate time, in seconds, represented by the cycles value.
May display as 0 (zero) for very small skews.
Examples:
1) The following both skew the clocks by 60 seconds:
* skew -s
* skew -s 60
2) Display current clock skew:
- skew -q
3) Sample output on system where clocks are not skewed:
Delta (CPU0) Delta (prev CPU)
TSC Value Cycles Time Cycles Time
------------------- --------------- ---- --------------- ----
CPU0: 0x00000007:8BF43FD8 0x0000:00000000 0.0 0x0000:00000000 0.0
CPU1: 0x00000007:8BF53540 0x0000:0000F568 0.0 0x0000:0000F568 0.0
CPU2: 0x00000007:8BF45118 0x0000:00001140 0.0 0x0000:0000E428 0.0
CPU3: 0x00000007:8BF554F8 0x0000:00011520 0.0 0x0000:000103E0 0.0
4) Sample output on system where clocks are skewed:
Delta (CPU0) Delta (prev CPU)
TSC Value Cycles Time Cycles Time
CPU0: 0x00000000:000198E8 0x0000:00000000 0.0 0x0000:00000000 0.0
CPU1: 0x0000000D:F0650204 0x000D:F063691C 25.0 0x000D:F063691C 25.0
CPU2: 0x0000001B:E0C43A30 0x001B:E0C2A148 50.0 0x000D:F05F382C 25.0
CPU3: 0x00000029:D09D62AC 0x0029:D09BC9C4 75.0 0x000D:EFD9287C 25.0