1 The MSI Driver Guide HOWTO
2 Tom L Nguyen tom.l.nguyen@intel.com
4 Revised Feb 12, 2004 by Martine Silbermann
5 email: Martine.Silbermann@hp.com
9 This guide describes the basics of Message Signaled Interrupts(MSI), the
10 advantages of using MSI over traditional interrupt mechanisms, and how
11 to enable your driver to use MSI or MSI-X. Also included is a Frequently
14 2. Copyright 2003 Intel Corporation
18 Message Signaled Interrupt (MSI), as described in the PCI Local Bus
19 Specification Revision 2.3 or latest, is an optional feature, and a
20 required feature for PCI Express devices. MSI enables a device function
21 to request service by sending an Inbound Memory Write on its PCI bus to
22 the FSB as a Message Signal Interrupt transaction. Because MSI is
23 generated in the form of a Memory Write, all transaction conditions,
24 such as a Retry, Master-Abort, Target-Abort or normal completion, are
27 A PCI device that supports MSI must also support pin IRQ assertion
28 interrupt mechanism to provide backward compatibility for systems that
29 do not support MSI. In Systems, which support MSI, the bus driver is
30 responsible for initializing the message address and message data of
31 the device function's MSI/MSI-X capability structure during device
32 initial configuration.
34 An MSI capable device function indicates MSI support by implementing
35 the MSI/MSI-X capability structure in its PCI capability list. The
36 device function may implement both the MSI capability structure and
37 the MSI-X capability structure; however, the bus driver should not
38 enable both, but instead enable only the MSI-X capability structure.
40 The MSI capability structure contains Message Control register,
41 Message Address register and Message Data register. These registers
42 provide the bus driver control over MSI. The Message Control register
43 indicates the MSI capability supported by the device. The Message
44 Address register specifies the target address and the Message Data
45 register specifies the characteristics of the message. To request
46 service, the device function writes the content of the Message Data
47 register to the target address. The device and its software driver
48 are prohibited from writing to these registers.
50 The MSI-X capability structure is an optional extension to MSI. It
51 uses an independent and separate capability structure. There are
52 some key advantages to implementing the MSI-X capability structure
53 over the MSI capability structure as described below.
55 - Support a larger maximum number of vectors per function.
57 - Provide the ability for system software to configure
58 each vector with an independent message address and message
59 data, specified by a table that resides in Memory Space.
61 - MSI and MSI-X both support per-vector masking. Per-vector
62 masking is an optional extension of MSI but a required
63 feature for MSI-X. Per-vector masking provides the kernel
64 the ability to mask/unmask MSI when servicing its software
65 interrupt service routing handler. If per-vector masking is
66 not supported, then the device driver should provide the
67 hardware/software synchronization to ensure that the device
68 generates MSI when the driver wants it to do so.
72 As a benefit the simplification of board design, MSI allows board
73 designers to remove out of band interrupt routing. MSI is another
74 step towards a legacy-free environment.
76 Due to increasing pressure on chipset and processor packages to
77 reduce pin count, the need for interrupt pins is expected to
78 diminish over time. Devices, due to pin constraints, may implement
79 messages to increase performance.
81 PCI Express endpoints uses INTx emulation (in-band messages) instead
82 of IRQ pin assertion. Using INTx emulation requires interrupt
83 sharing among devices connected to the same node (PCI bridge) while
84 MSI is unique (non-shared) and does not require BIOS configuration
85 support. As a result, the PCI Express technology requires MSI
86 support for better interrupt performance.
88 Using MSI enables the device functions to support two or more
89 vectors, which can be configure to target different CPU's to
92 5. Configuring a driver to use MSI/MSI-X
94 By default, the kernel will not enable MSI/MSI-X on all devices that
95 support this capability. The CONFIG_PCI_USE_VECTOR kernel option
96 must be selected to enable MSI/MSI-X support.
98 5.1 Including MSI support into the kernel
100 To allow MSI-Capable device drivers to selectively enable MSI (using
101 pci_enable_msi as described below), the VECTOR based scheme needs to
102 be enabled by setting CONFIG_PCI_USE_VECTOR.
104 Since the target of the inbound message is the local APIC, providing
105 CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC
108 int pci_enable_msi(struct pci_dev *)
110 With this new API, any existing device driver, which like to have
111 MSI enabled on its device function, must call this explicitly. A
112 successful call will initialize the MSI/MSI-X capability structure
113 with ONE vector, regardless of whether the device function is
114 capable of supporting multiple messages. This vector replaces the
115 pre-assigned dev->irq with a new MSI vector. To avoid the conflict
116 of new assigned vector with existing pre-assigned vector requires
117 the device driver to call this API before calling request_irq(...).
119 The below diagram shows the events, which switches the interrupt
120 mode on the MSI-capable device function between MSI mode and
121 PIN-IRQ assertion mode.
123 ------------ pci_enable_msi ------------------------
124 | | <=============== | |
125 | MSI MODE | | PIN-IRQ ASSERTION MODE |
126 | | ===============> | |
127 ------------ free_irq ------------------------
129 5.2 Configuring for MSI support
131 Due to the non-contiguous fashion in vector assignment of the
132 existing Linux kernel, this version does not support multiple
133 messages regardless of the device function is capable of supporting
134 more than one vector. The bus driver initializes only entry 0 of
135 this capability if pci_enable_msi(...) is called successfully by
138 5.3 Configuring for MSI-X support
140 Both the MSI capability structure and the MSI-X capability structure
141 share the same above semantics; however, due to the ability of the
142 system software to configure each vector of the MSI-X capability
143 structure with an independent message address and message data, the
144 non-contiguous fashion in vector assignment of the existing Linux
145 kernel has no impact on supporting multiple messages on an MSI-X
146 capable device functions. By default, as mentioned above, ONE vector
147 should be always allocated to the MSI-X capability structure at
148 entry 0. The bus driver does not initialize other entries of the
151 Note that the PCI subsystem should have full control of a MSI-X
152 table that resides in Memory Space. The software device driver
153 should not access this table.
155 To request for additional vectors, the device software driver should
156 call function msi_alloc_vectors(). It is recommended that the
157 software driver should call this function once during the
158 initialization phase of the device driver.
160 The function msi_alloc_vectors(), once invoked, enables either
161 all or nothing, depending on the current availability of vector
162 resources. If no vector resources are available, the device function
163 still works with ONE vector. If the vector resources are available
164 for the number of vectors requested by the driver, this function
165 will reconfigure the MSI-X capability structure of the device with
166 additional messages, starting from entry 1. To emphasize this
167 reason, for example, the device may be capable for supporting the
168 maximum of 32 vectors while its software driver usually may request
171 For each vector, after this successful call, the device driver is
172 responsible to call other functions like request_irq(), enable_irq(),
173 etc. to enable this vector with its corresponding interrupt service
174 handler. It is the device driver's choice to have all vectors shared
175 the same interrupt service handler or each vector with a unique
176 interrupt service handler.
178 In addition to the function msi_alloc_vectors(), another function
179 msi_free_vectors() is provided to allow the software driver to
180 release a number of vectors back to the vector resources. Once
181 invoked, the PCI subsystem disables (masks) each vector released.
182 These vectors are no longer valid for the hardware device and its
183 software driver to use. Like free_irq, it recommends that the
184 device driver should also call msi_free_vectors to release all
185 additional vectors previously requested.
187 int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec)
189 This API enables the software driver to request the PCI subsystem
190 for additional messages. Depending on the number of vectors
191 available, the PCI subsystem enables either all or nothing.
193 Argument dev points to the device (pci_dev) structure.
194 Argument vector is a pointer of integer type. The number of
195 elements is indicated in argument nvec.
196 Argument nvec is an integer indicating the number of messages
198 A return of zero indicates that the number of allocated vector is
199 successfully allocated. Otherwise, indicate resources not
202 int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec)
204 This API enables the software driver to inform the PCI subsystem
205 that it is willing to release a number of vectors back to the
206 MSI resource pool. Once invoked, the PCI subsystem disables each
207 MSI-X entry associated with each vector stored in the argument 2.
208 These vectors are no longer valid for the hardware device and
209 its software driver to use.
211 Argument dev points to the device (pci_dev) structure.
212 Argument vector is a pointer of integer type. The number of
213 elements is indicated in argument nvec.
214 Argument nvec is an integer indicating the number of messages
216 A return of zero indicates that the number of allocated vectors
217 is successfully released. Otherwise, indicates a failure.
219 5.4 Hardware requirements for MSI support
220 MSI support requires support from both system hardware and
221 individual hardware device functions.
223 5.4.1 System hardware support
224 Since the target of MSI address is the local APIC CPU, enabling
225 MSI support in Linux kernel is dependent on whether existing
226 system hardware supports local APIC. Users should verify their
227 system whether it runs when CONFIG_X86_LOCAL_APIC=y.
229 In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
230 however, in UP environment, users must manually set
231 CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
232 CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and
233 the option for MSI-capable device drivers to selectively enable
234 MSI (using pci_enable_msi as described below).
236 Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI
237 vector is allocated new during runtime and MSI support does not
238 depend on BIOS support. This key independency enables MSI support
239 on future IOxAPIC free platform.
241 5.4.2 Device hardware support
242 The hardware device function supports MSI by indicating the
243 MSI/MSI-X capability structure on its PCI capability list. By
244 default, this capability structure will not be initialized by
245 the kernel to enable MSI during the system boot. In other words,
246 the device function is running on its default pin assertion mode.
247 Note that in many cases the hardware supporting MSI have bugs,
248 which may result in system hang. The software driver of specific
249 MSI-capable hardware is responsible for whether calling
250 pci_enable_msi or not. A return of zero indicates the kernel
251 successfully initializes the MSI/MSI-X capability structure of the
252 device funtion. The device function is now running on MSI mode.
254 5.5 How to tell whether MSI is enabled on device function
256 At the driver level, a return of zero from pci_enable_msi(...)
257 indicates to the device driver that its device function is
258 initialized successfully and ready to run in MSI mode.
260 At the user level, users can use command 'cat /proc/interrupts'
261 to display the vector allocated for the device and its interrupt
262 mode, as shown below.
265 0: 324639 0 IO-APIC-edge timer
266 1: 1186 0 IO-APIC-edge i8042
267 2: 0 0 XT-PIC cascade
268 12: 2797 0 IO-APIC-edge i8042
269 14: 6543 0 IO-APIC-edge ide0
270 15: 1 0 IO-APIC-edge ide1
271 169: 0 0 IO-APIC-level uhci-hcd
272 185: 0 0 IO-APIC-level uhci-hcd
273 193: 138 10 PCI MSI aic79xx
274 201: 30 0 PCI MSI aic79xx
275 225: 30 0 IO-APIC-level aic7xxx
276 233: 30 0 IO-APIC-level aic7xxx
284 Q1. Are there any limitations on using the MSI?
286 A1. If the PCI device supports MSI and conforms to the
287 specification and the platform supports the APIC local bus,
288 then using MSI should work.
290 Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
291 AMD processors)? In P3 IPI's are transmitted on the APIC local
292 bus and in P4 and Xeon they are transmitted on the system
293 bus. Are there any implications with this?
295 A2. MSI support enables a PCI device sending an inbound
296 memory write (0xfeexxxxx as target address) on its PCI bus
297 directly to the FSB. Since the message address has a
298 redirection hint bit cleared, it should work.
300 Q3. The target address 0xfeexxxxx will be translated by the
301 Host Bridge into an interrupt message. Are there any
302 limitations on the chipsets such as Intel 8xx, Intel e7xxx,
305 A3. If these chipsets support an inbound memory write with
306 target address set as 0xfeexxxxx, as conformed to PCI
307 specification 2.3 or latest, then it should work.
309 Q4. From the driver point of view, if the MSI is lost because
310 of the errors occur during inbound memory write, then it may
311 wait for ever. Is there a mechanism for it to recover?
313 A4. Since the target of the transaction is an inbound memory
314 write, all transaction termination conditions (Retry,
315 Master-Abort, Target-Abort, or normal completion) are
316 supported. A device sending an MSI must abide by all the PCI
317 rules and conditions regarding that inbound memory write. So,
318 if a retry is signaled it must retry, etc... We believe that
319 the recommendation for Abort is also a retry (refer to PCI
320 specification 2.3 or latest).