The CPI measurement facility allows for easily measuring CPI for user workloads, either the entire workload or portions of a workload. There are three (3) types of CPI:
Raw CPI

Scaled CPI

OnThread CPI

The CPI measurement facility must be initialized, via CpiInit(), prior to being used. The first CpiInit() systemwide initializes the necessary hardware and PI tools support:
CPI is measured for a CPI instance. A CPI instance is the interval of time between the last CpiStart() and the current CpiGet(). Each CpiInit() signifies a new instance. There can be any number of concurrent instances. Measurements are associated with a particular instance as given by the instance handle. Thus it is conceivable to measure CPI for several portions of an application. For example:
void * interval1, * interval2, * interval3; CPI_DATA data1, data2, data3; ... CpiInit(flags, &interval1); CpiInit(flags, &interval2); CpiInit(flags, &interval3); ... CpiStart(interval1); <+ // Application code 1 ... 1 CpiStart(interval2); <+ 1 // More application code 2 1 ... 2 1 CpiStart(interval3); <+ 2 1 // Even more application code 3 2 1 ... 3 2 1 CpiGet(interval3, &data3); <+ 2 1 // More application code 2 1 ... 2 1 CpiGet(interval2, &data2); <+ 1 // Application code 1 ... 1 CpiGet(interval1, &data1); <+ ... CpiTerminate(interval1); CpiTerminate(interval2); CpiTerminate(interval3); ...
In the sample code above:
Instance results are returned in a CPI_DATA structure.
Trace Hook Formats
In the descriptions that follow:
If a label is present, a label hook will be the first hook written.
LABEL HOOK (start) ****************** MMMMMMMM mmmmmmmm data C0 000000mm elapsed_cycles (high) elapsed_cycles (low) "label"
If trace hook type CPI_TYPE_TRACE_SUMMARY was set in the CpiInit() call or in a subsequent CpiSetTraceHookId() call, all CPI data is traced in a single, at times very long hook. The hooks are variable in length, depending on the number of processors for which there is data.
SUMMARY HOOK (CPI_DATA contents) ************ MMMMMMMM mmmmmmmm data C0 100000mm (00000001  00000002  00000004) // 1: raw, 2: scaled, 4: thread num_cpus elapsed_cycles (high) elapsed_cycles (low) +> raw_cpi (integral) +> raw_cpi (fractional, to 4 places) raw_sys_cpi (integral) raw_sys_cpi (fractional, to 4 places) +> scaled_cpi (integral) // if CPI_TYPE_SCALED  scaled_cpi (fractional, to 4 places) // if CPI_TYPE_SCALED  cpu_busy (integral) // if CPI_TYPE_SCALED +> cpu_busy (fractional, to 4 places) // if CPI_TYPE_SCALED scaled_sys_cpi (integral) // if CPI_TYPE_SCALED scaled_sys_cpi (fractional, to 4 places) // if CPI_TYPE_SCALED sys_busy (integral) // if CPI_TYPE_SCALED sys_busy (fractional, to 4 places) // if CPI_TYPE_SCALED thread_cpi (integral) // if CPI_TYPE_ONTHREAD thread_cpi (fractional, to 4 places) // if CPI_TYPE_ONTHREAD
If trace hook type CPI_TYPE_TRACE_DETAILED was set in the CpiInit() call or in a subsequent CpiSetTraceHookId() call, CPI data and the raw data for the interval are traced in a multiple hooks follows:
FULLCONTENT Systemwide CPI data ********************************* MMMMMMMM mmmmmmmm data C0 800000mm (00000001  00000002) // 1: raw, 2: scaled elapsed_cycles (high) elapsed_cycles (low) sys_instr (high) sys_instr (low) sys_cycles (high) sys_cycles (low) sys_cpi (integral) sys_cpi (fractional, to 4 places) sys_cycles_scaled (high) // if CPI_TYPE_SCALED sys_cycles_scaled (low) // if CPI_TYPE_SCALED sys_cpi_scaled (integral) // if CPI_TYPE_SCALED sys_cpi_scaled (fractional, to 4 places) // if CPI_TYPE_SCALED sys_busy_pc (integral) // if CPI_TYPE_SCALED sys_busy_pc (fractional, to 4 places) // if CPI_TYPE_SCALED FULLCONTENT Perprocessor CPI data *********************************** MMMMMMMM mmmmmmmm data C0 400000mm (00000001  00000002) // 1: raw, 2: scaled num_cpus elapsed_cycles (high) elapsed_cycles (low) ++> cpu_instr (high)   cpu_instr (low)   cpu_cycles (high)   cpu_cycles (low)   cpu_cpi (integral)  +> cpu_cpi (fractional, to 4 places)  cpu_cycles_scaled (high) // if CPI_TYPE_SCALED  cpu_cycles_scaled (low) // if CPI_TYPE_SCALED  cpu_cpi_scaled (integral) // if CPI_TYPE_SCALED  cpu_cpi_scaled (fractional, to 4 places) // if CPI_TYPE_SCALED  cpu_busy_pc (integral) // if CPI_TYPE_SCALED +> cpu_busy_pc (fractional, to 4 places) // if CPI_TYPE_SCALED FULLCONTENT Thread CPI data **************************** MMMMMMMM mmmmmmmm data C0 200000mm tid elapsed_cycles (high) elapsed_cycles (low) thread_instr (high) thread_instr (low) thread_cycles (high) thread_cycles (low) thread_cpi (integral) thread_cpi (fractional, to 4 places)
If a label is present, a label hook will be the last hook written.
LABEL HOOK (end) **************** MMMMMMMM mmmmmmmm data C0 F00000mm elapsed_cycles (high) elapsed_cycles (low) "label"
Creates and initializes a CPI measurement instance.

Terminate CPI measurement instance.

Marks the beginning of a CPI measurement interval.

Calculates CPI since the start of the measurement interval or since CpiInit() was invoked.

Calculates CPI since the start of the measurement interval, or since CpiInit() was invoked, and writes the contents of the CPI_DATA structure to the SWTRACE buffer.

Sets hook minor code and hook group label for the next set of trace hooks.

Structure used to return raw CPU utilization counters.
typedef struct _cpi_data CPI_DATA;
typedef struct _cpi_data cpi_data_t; struct _cpi_data { int num_cpus; UINT32 tid; UINT64 elapsed_cycles; // Data returned if CPI_TYPE_RAW  always double raw_cpi[MAX_CPUS]; double raw_sys_cpi; // Data returned if CPI_TYPE_SCALED double scaled_cpi[MAX_CPUS]; double scaled_sys_cpi; double cpu_busy[MAX_CPUS]; double sys_busy; // Data returned if CPI_TYPE_ONTHREAD double thread_cpi; }; 
This is a very simple example of using the CPI measurement facility.
// // Sample code to obtain CPI and CPU utilization // #include <windows.h> #include <stdio.h> #include <stdlib.h> #include "perfutil.h" int main (int argc, char * argv[]) { int i; int rc; void * cpi_handle = NULL; const CPI_DATA * cd; int num_cpus; ///////////////// For measurement scenario. You wouldn't need this int j, k; DWORD dw; char EnvVarBuffer[1024], DirBuffer[256]; ///////////////// For measurement scenario. You wouldn't need this // Initialize CPI facility rc = CpiInit((CPI_TYPE_RAW  CPI_TYPE_SCALED), &cpi_handle); if (rc != 0) { printf("CpiInit() failed. rc = %d (%s). Quitting.\n", rc, RcToString(rc)); return (1); } printf("***** Starting measurement interval *****\n\n"); // Prime CPI and latch initial set of counters CpiStart(cpi_handle); // // ********************************************************************* // ********************************************************************* // // SCENARIO TO BE MEASURED // (Just some busy loop in this example) // // ********************************************************************* // ********************************************************************* for (j = 0; j < 500; j++) { for (k = 0; k < 10000; k++) { dw = GetEnvironmentVariable("PATH", EnvVarBuffer, sizeof(EnvVarBuffer)); dw = GetCurrentDirectory(sizeof(DirBuffer), DirBuffer); } if (j % 10 == 0) printf("*"); Sleep(10); // 10 milliseconds to get some idle time } printf("\n"); // // ********************************************************************* // ********************************************************************* // // SCENARIO TO BE MEASURED // // ********************************************************************* // ********************************************************************* // Read CPI counters for the interval since the CpiStart() CpiGet(cpi_handle, &cd); printf("\n***** Endig measurement interval *****\n\n"); num_cpus = GetActiveProcessorCount(); // Display systemwide CPI printf("cpi: %5.2f cpi_wb: %5.2f idle: %5.2f%% busy: %5.2f%%", cd>raw_sys_cpi, cd>scaled_sys_cpi, (100.0  cd>sys_busy), cd>sys_busy); // Display systemwide and perprocessor CPI printf(" System"); for (i = 0; i < num_cpus; i++) { printf(" CPU%d", i); } printf("\ncpi %6.2f", cd>raw_sys_cpi); for (i = 0; i < num_cpus; i++) { printf(" %6.2f", cd>raw_cpi[i]); } printf("\ncpi_wb %6.2f", cd>scaled_sys_cpi); for (i = 0; i < num_cpus; i++) { printf(" %6.2f", cd>scaled_cpi[i]); } printf("\nidle%% %6.2f", (100.0  cd>sys_busy)); for (i = 0; i < num_cpus; i++) { printf(" %6.2f", (100.0  cd>cpu_busy[i])); } printf("\nbusy%% %6.2f", cd>sys_busy); for (i = 0; i < num_cpus; i++) { printf(" %6.2f", cd>>cpu_busy[i]); } // Done. fflush(stdout); CpiTerminate(cpi_handle); return (0); }