Documentation/arm/XScale/pmu.txt

   1
   2 Intel's XScale Microarchitecture processors provide a Performance
   3 Monitoring Unit (PMU) that can be utilized to provide information
   4 that can be useful for fine tuning of code.  This text file describes
   5 the API that's been developed for use by Linux kernel programmers.
   6 When I have some extra time on my hand, I will extend the code to
   7 provide support for user mode performance monitoring (which is
   8 probably much more useful).  Note that to get the most usage out
   9 of the PMU, I highly reccomend getting the XScale reference manual
  10 from Intel and looking at chapter 12.
  11
  12 To use the PMU, you must #include <asm/xscale-pmu.h> in your source file.
  13
  14 Since there's only one PMU, only one user can currently use the PMU
  15 at a given time.  To claim the PMU for usage, call pmu_claim() which
  16 returns an identifier.  When you are done using the PMU, call
  17 pmu_release() with the identifier that you were given by pmu_claim.
  18
  19 In addition, the PMU can only be used on XScale based systems that
  20 provide an external timer.  Systems that the PMU is currently supported
  21 on are:
  22
  23         - Cyclone IQ80310
  24
  25 Before delving into how to use the PMU code, let's do a quick overview
  26 of the PMU itself.  The PMU consists of three registers that can be
  27 used for performance measurements.  The first is the CCNT register with
  28 provides the number of clock cycles elapsed since the PMU was started.
  29 The next two register, PMN0 and PMN1, are eace user programmable to
  30 provide 1 of 20 different performance statistics.  By combining different
  31 statistics, you can derive complex performance metrics.
  32
  33 To start the PMU, just call pmu_start(pm0, pmn1).  pmn0 and pmn1 tell
  34 the PMU what statistics to capture and can each be one of:
  35
  36 EVT_ICACHE_MISS
  37         Instruction fetches requiring access to external memory
  38
  39 EVT_ICACHE_NO_DELIVER
  40         Instruction cache could not deliver an instruction.  Either an
  41         ICACHE miss or an instruction TLB miss.
  42
  43 EVT_ICACHE_DATA_STALL
  44         Stall in execution due to a data dependency. This counter is
  45         incremented each cycle in which the condition is present.
  46
  47 EVT_ITLB_MISS
  48         Instruction TLB miss
  49
  50 EVT_DTLB_MISS
  51         Data TLB miss
  52
  53 EVT_BRANCH
  54         A branch instruction was executed and it may or may not have
  55         changed program flow
  56
  57 EVT_BRANCH_MISS
  58         A branch (B or BL instructions only) was mispredicted
  59
  60 EVT_INSTRUCTION
  61         An instruction was executed
  62
  63 EVT_DCACHE_FULL_STALL
  64         Stall because data cache buffers are full.  Incremented on every
  65         cycle in which condition is present.
  66
  67 EVT_DCACHE_FULL_STALL_CONTIG
  68         Stall because data cache buffers are full.  Incremented on every
  69         cycle in which condition is contigous.
  70
  71 EVT_DCACHE_ACCESS
  72         Data cache access (data fetch)
  73
  74 EVT_DCACHE_MISS
  75         Data cache miss
  76
  77 EVT_DCACHE_WRITE_BACK
  78         Data cache write back.  This counter is incremented for every
  79         1/2 line (four words) that are written back.
  80
  81 EVT_PC_CHANGED
  82         Software changed the PC.  This is incremented only when the
  83         software changes the PC and there is no mode change.  For example,
  84         a MOV instruction that targets the PC would increment the counter.
  85         An SWI would not as it triggers a mode change.
  86
  87 EVT_BCU_REQUEST
  88         The Bus Control Unit(BCU) received a request from the core
  89
  90 EVT_BCU_FULL
  91         The BCU request queue if full.  A high value for this event means
  92         that the BCU is often waiting for to complete on the external bus.
  93
  94 EVT_BCU_DRAIN
  95         The BCU queues were drained due to either a Drain Write Buffer
  96         command or an I/O transaction for a page that was marked as
  97         uncacheable and unbufferable.
  98
  99 EVT_BCU_ECC_NO_ELOG
 100         The BCU detected an ECC error on the memory bus but noe ELOG
 101         register was available to to log the errors.
 102
 103 EVT_BCU_1_BIT_ERR
 104         The BCU detected a 1-bit error while reading from the bus.
 105
 106 EVT_RMW
 107         An RMW cycle occurred due to narrow write on ECC protected memory.
 108
 109 To get the results back, call pmu_stop(&results) where results is defined
 110 as a struct pmu_results:
 111
 112         struct pmu_results
 113         {
 114                 u32     ccnt;   /* Clock Counter Register */
 115                 u32     ccnt_of; /
 116                 u32     pmn0;   /* Performance Counter Register 0 */
 117                 u32     pmn0_of;
 118                 u32     pmn1;   /* Performance Counter Register 1 */
 119                 u32     pmn1_of;
 120         };
 121
 122 Pretty simple huh?  Following are some examples of how to get some commonly
 123 wanted numbers out of the PMU data.  Note that since you will be dividing
 124 things, this isn't super useful from the kernel and you need to printk the
 125 data out to syslog.  See [1] for more examples.
 126
 127 Instruction Cache Efficiency
 128
 129         pmu_start(EVT_INSTRUCTION, EVT_ICACHE_MISS);
 130         ...
 131         pmu_stop(&results);
 132
 133         icache_miss_rage = results.pmn1 / results.pmn0;
 134         cycles_per_instruction = results.ccnt / results.pmn0;
 135
 136 Data Cache Efficiency
 137
 138         pmu_start(EVT_DCACHE_ACCESS, EVT_DCACHE_MISS);
 139         ...
 140         pmu_stop(&results);
 141
 142         dcache_miss_rage = results.pmn1 / results.pmn0;
 143
 144 Instruction Fetch Latency
 145
 146         pmu_start(EVT_ICACHE_NO_DELIVER, EVT_ICACHE_MISS);
 147         ...
 148         pmu_stop(&results);
 149
 150         average_stall_waiting_for_instruction_fetch =
 151                 results.pmn0 / results.pmn1;
 152
 153         percent_stall_cycles_due_to_instruction_fetch =
 154                 results.pmn0 / results.ccnt;
 155
 156
 157 ToDo:
 158
 159 - Add support for usermode PMU usage.  This might require hooking into
 160   the scheduler so that we pause the PMU when the task that requested
 161   statistics is scheduled out.
 162
 163 --
 164 This code is still under development, so please feel free to send patches,
 165 questions, comments, etc to me.
 166
 167 Deepak Saxena <dsaxena@mvista.com>
 168