X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2FMSI-HOWTO.txt;h=d5032eb480aa364593d73bd6f211ec23bc2b990e;hb=9bf4aaab3e101692164d49b7ca357651eb691cb6;hp=18526692de49ccf0dca827c8e4f59ac8cfef1333;hpb=db216c3d5e4c040e557a50f8f5d35d5c415e8c1c;p=linux-2.6.git diff --git a/Documentation/MSI-HOWTO.txt b/Documentation/MSI-HOWTO.txt index 18526692d..d5032eb48 100644 --- a/Documentation/MSI-HOWTO.txt +++ b/Documentation/MSI-HOWTO.txt @@ -3,13 +3,14 @@ 10/03/2003 Revised Feb 12, 2004 by Martine Silbermann email: Martine.Silbermann@hp.com + Revised Jun 25, 2004 by Tom L Nguyen 1. About this guide -This guide describes the basics of Message Signaled Interrupts(MSI), the -advantages of using MSI over traditional interrupt mechanisms, and how -to enable your driver to use MSI or MSI-X. Also included is a Frequently -Asked Questions. +This guide describes the basics of Message Signaled Interrupts (MSI), +the advantages of using MSI over traditional interrupt mechanisms, +and how to enable your driver to use MSI or MSI-X. Also included is +a Frequently Asked Questions. 2. Copyright 2003 Intel Corporation @@ -35,7 +36,7 @@ An MSI capable device function indicates MSI support by implementing the MSI/MSI-X capability structure in its PCI capability list. The device function may implement both the MSI capability structure and the MSI-X capability structure; however, the bus driver should not -enable both, but instead enable only the MSI-X capability structure. +enable both. The MSI capability structure contains Message Control register, Message Address register and Message Data register. These registers @@ -86,35 +87,62 @@ support. As a result, the PCI Express technology requires MSI support for better interrupt performance. Using MSI enables the device functions to support two or more -vectors, which can be configure to target different CPU's to +vectors, which can be configured to target different CPU's to increase scalability. 5. Configuring a driver to use MSI/MSI-X By default, the kernel will not enable MSI/MSI-X on all devices that -support this capability. The CONFIG_PCI_USE_VECTOR kernel option +support this capability. The CONFIG_PCI_MSI kernel option must be selected to enable MSI/MSI-X support. -5.1 Including MSI support into the kernel +5.1 Including MSI/MSI-X support into the kernel -To allow MSI-Capable device drivers to selectively enable MSI (using -pci_enable_msi as described below), the VECTOR based scheme needs to -be enabled by setting CONFIG_PCI_USE_VECTOR. +To allow MSI/MSI-X capable device drivers to selectively enable +MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described +below), the VECTOR based scheme needs to be enabled by setting +CONFIG_PCI_MSI during kernel config. Since the target of the inbound message is the local APIC, providing -CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC -is enabled or not. +CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI. -int pci_enable_msi(struct pci_dev *) +5.2 Configuring for MSI support + +Due to the non-contiguous fashion in vector assignment of the +existing Linux kernel, this version does not support multiple +messages regardless of a device function is capable of supporting +more than one vector. To enable MSI on a device function's MSI +capability structure requires a device driver to call the function +pci_enable_msi() explicitly. + +5.2.1 API pci_enable_msi + +int pci_enable_msi(struct pci_dev *dev) With this new API, any existing device driver, which like to have -MSI enabled on its device function, must call this explicitly. A -successful call will initialize the MSI/MSI-X capability structure -with ONE vector, regardless of whether the device function is +MSI enabled on its device function, must call this API to enable MSI +A successful call will initialize the MSI capability structure +with ONE vector, regardless of whether a device function is capable of supporting multiple messages. This vector replaces the pre-assigned dev->irq with a new MSI vector. To avoid the conflict of new assigned vector with existing pre-assigned vector requires -the device driver to call this API before calling request_irq(...). +a device driver to call this API before calling request_irq(). + +5.2.2 API pci_disable_msi + +void pci_disable_msi(struct pci_dev *dev) + +This API should always be used to undo the effect of pci_enable_msi() +when a device driver is unloading. This API restores dev->irq with +the pre-assigned IOAPIC vector and switches a device's interrupt +mode to PCI pin-irq assertion/INTx emulation mode. + +Note that a device driver should always call free_irq() on MSI vector +it has done request_irq() on before calling this API. Failure to do +so results a BUG_ON() and a device will be left with MSI enabled and +leaks its vector. + +5.2.3 MSI mode vs. legacy mode diagram The below diagram shows the events, which switches the interrupt mode on the MSI-capable device function between MSI mode and @@ -124,121 +152,274 @@ PIN-IRQ assertion mode. | | <=============== | | | MSI MODE | | PIN-IRQ ASSERTION MODE | | | ===============> | | - ------------ free_irq ------------------------ + ------------ pci_disable_msi ------------------------ -5.2 Configuring for MSI support -Due to the non-contiguous fashion in vector assignment of the -existing Linux kernel, this version does not support multiple -messages regardless of the device function is capable of supporting -more than one vector. The bus driver initializes only entry 0 of -this capability if pci_enable_msi(...) is called successfully by -the device driver. +Figure 1.0 MSI Mode vs. Legacy Mode + +In Figure 1.0, a device operates by default in legacy mode. Legacy +in this context means PCI pin-irq assertion or PCI-Express INTx +emulation. A successful MSI request (using pci_enable_msi()) switches +a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector +stored in dev->irq will be saved by the PCI subsystem and a new +assigned MSI vector will replace dev->irq. + +To return back to its default mode, a device driver should always call +pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a +device driver should always call free_irq() on MSI vector it has done +request_irq() on before calling pci_disable_msi(). Failure to do so +results a BUG_ON() and a device will be left with MSI enabled and +leaks its vector. Otherwise, the PCI subsystem restores a device's +dev->irq with a pre-assigned IOAPIC vector and marks released +MSI vector as unused. + +Once being marked as unused, there is no guarantee that the PCI +subsystem will reserve this MSI vector for a device. Depending on +the availability of current PCI vector resources and the number of +MSI/MSI-X requests from other drivers, this MSI may be re-assigned. + +For the case where the PCI subsystem re-assigned this MSI vector +another driver, a request to switching back to MSI mode may result +in being assigned a different MSI vector or a failure if no more +vectors are available. 5.3 Configuring for MSI-X support -Both the MSI capability structure and the MSI-X capability structure -share the same above semantics; however, due to the ability of the -system software to configure each vector of the MSI-X capability -structure with an independent message address and message data, the -non-contiguous fashion in vector assignment of the existing Linux -kernel has no impact on supporting multiple messages on an MSI-X -capable device functions. By default, as mentioned above, ONE vector -should be always allocated to the MSI-X capability structure at -entry 0. The bus driver does not initialize other entries of the -MSI-X table. - -Note that the PCI subsystem should have full control of a MSI-X -table that resides in Memory Space. The software device driver -should not access this table. - -To request for additional vectors, the device software driver should -call function msi_alloc_vectors(). It is recommended that the -software driver should call this function once during the +Due to the ability of the system software to configure each vector of +the MSI-X capability structure with an independent message address +and message data, the non-contiguous fashion in vector assignment of +the existing Linux kernel has no impact on supporting multiple +messages on an MSI-X capable device functions. To enable MSI-X on +a device function's MSI-X capability structure requires its device +driver to call the function pci_enable_msix() explicitly. + +The function pci_enable_msix(), once invoked, enables either +all or nothing, depending on the current availability of PCI vector +resources. If the PCI vector resources are available for the number +of vectors requested by a device driver, this function will configure +the MSI-X table of the MSI-X capability structure of a device with +requested messages. To emphasize this reason, for example, a device +may be capable for supporting the maximum of 32 vectors while its +software driver usually may request 4 vectors. It is recommended +that the device driver should call this function once during the initialization phase of the device driver. -The function msi_alloc_vectors(), once invoked, enables either -all or nothing, depending on the current availability of vector -resources. If no vector resources are available, the device function -still works with ONE vector. If the vector resources are available -for the number of vectors requested by the driver, this function -will reconfigure the MSI-X capability structure of the device with -additional messages, starting from entry 1. To emphasize this -reason, for example, the device may be capable for supporting the -maximum of 32 vectors while its software driver usually may request -4 vectors. - -For each vector, after this successful call, the device driver is -responsible to call other functions like request_irq(), enable_irq(), -etc. to enable this vector with its corresponding interrupt service -handler. It is the device driver's choice to have all vectors shared -the same interrupt service handler or each vector with a unique -interrupt service handler. - -In addition to the function msi_alloc_vectors(), another function -msi_free_vectors() is provided to allow the software driver to -release a number of vectors back to the vector resources. Once -invoked, the PCI subsystem disables (masks) each vector released. -These vectors are no longer valid for the hardware device and its -software driver to use. Like free_irq, it recommends that the -device driver should also call msi_free_vectors to release all -additional vectors previously requested. - -int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec) - -This API enables the software driver to request the PCI subsystem -for additional messages. Depending on the number of vectors -available, the PCI subsystem enables either all or nothing. +Unlike the function pci_enable_msi(), the function pci_enable_msix() +does not replace the pre-assigned IOAPIC dev->irq with a new MSI +vector because the PCI subsystem writes the 1:1 vector-to-entry mapping +into the field vector of each element contained in a second argument. +Note that the pre-assigned IO-APIC dev->irq is valid only if the device +operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt of +using dev->irq by the device driver to request for interrupt service +may result unpredictabe behavior. + +For each MSI-X vector granted, a device driver is responsible to call +other functions like request_irq(), enable_irq(), etc. to enable +this vector with its corresponding interrupt service handler. It is +a device driver's choice to assign all vectors with the same +interrupt service handler or each vector with a unique interrupt +service handler. + +5.3.1 Handling MMIO address space of MSI-X Table + +The PCI 3.0 specification has implementation notes that MMIO address +space for a device's MSI-X structure should be isolated so that the +software system can set different page for controlling accesses to +the MSI-X structure. The implementation of MSI patch requires the PCI +subsystem, not a device driver, to maintain full control of the MSI-X +table/MSI-X PBA and MMIO address space of the MSI-X table/MSI-X PBA. +A device driver is prohibited from requesting the MMIO address space +of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem will fail +enabling MSI-X on its hardware device when it calls the function +pci_enable_msix(). + +5.3.2 Handling MSI-X allocation + +Determining the number of MSI-X vectors allocated to a function is +dependent on the number of MSI capable devices and MSI-X capable +devices populated in the system. The policy of allocating MSI-X +vectors to a function is defined as the following: + +#of MSI-X vectors allocated to a function = (x - y)/z where + +x = The number of available PCI vector resources by the time + the device driver calls pci_enable_msix(). The PCI vector + resources is the sum of the number of unassigned vectors + (new) and the number of released vectors when any MSI/MSI-X + device driver switches its hardware device back to a legacy + mode or is hot-removed. The number of unassigned vectors + may exclude some vectors reserved, as defined in parameter + NR_HP_RESERVED_VECTORS, for the case where the system is + capable of supporting hot-add/hot-remove operations. Users + may change the value defined in NR_HR_RESERVED_VECTORS to + meet their specific needs. + +y = The number of MSI capable devices populated in the system. + This policy ensures that each MSI capable device has its + vector reserved to avoid the case where some MSI-X capable + drivers may attempt to claim all available vector resources. + +z = The number of MSI-X capable devices pupulated in the system. + This policy ensures that maximum (x - y) is distributed + evenly among MSI-X capable devices. + +Note that the PCI subsystem scans y and z during a bus enumeration. +When the PCI subsystem completes configuring MSI/MSI-X capability +structure of a device as requested by its device driver, y/z is +decremented accordingly. + +5.3.3 Handling MSI-X shortages + +For the case where fewer MSI-X vectors are allocated to a function +than requested, the function pci_enable_msix() will return the +maximum number of MSI-X vectors available to the caller. A device +driver may re-send its request with fewer or equal vectors indicated +in a return. For example, if a device driver requests 5 vectors, but +the number of available vectors is 3 vectors, a value of 3 will be a +return as a result of pci_enable_msix() call. A function could be +designed for its driver to use only 3 MSI-X table entries as +different combinations as ABC--, A-B-C, A--CB, etc. Note that this +patch does not support multiple entries with the same vector. Such +attempt by a device driver to use 5 MSI-X table entries with 3 vectors +as ABBCC, AABCC, BCCBA, etc will result as a failure by the function +pci_enable_msix(). Below are the reasons why supporting multiple +entries with the same vector is an undesirable solution. + + - The PCI subsystem can not determine which entry, which + generated the message, to mask/unmask MSI while handling + software driver ISR. Attempting to walk through all MSI-X + table entries (2048 max) to mask/unmask any match vector + is an undesirable solution. + + - Walk through all MSI-X table entries (2048 max) to handle + SMP affinity of any match vector is an undesirable solution. + +5.3.4 API pci_enable_msix + +int pci_enable_msix(struct pci_dev *dev, u32 *entries, int nvec) + +This API enables a device driver to request the PCI subsystem +for enabling MSI-X messages on its hardware device. Depending on +the availability of PCI vectors resources, the PCI subsystem enables +either all or nothing. Argument dev points to the device (pci_dev) structure. -Argument vector is a pointer of integer type. The number of -elements is indicated in argument nvec. + +Argument entries is a pointer of unsigned integer type. The number of +elements is indicated in argument nvec. The content of each element +will be mapped to the following struct defined in /driver/pci/msi.h. + +struct msix_entry { + u16 vector; /* kernel uses to write alloc vector */ + u16 entry; /* driver uses to specify entry */ +}; + +A device driver is responsible for initializing the field entry of +each element with unique entry supported by MSI-X table. Otherwise, +-EINVAL will be returned as a result. A successful return of zero +indicates the PCI subsystem completes initializing each of requested +entries of the MSI-X table with message address and message data. +Last but not least, the PCI subsystem will write the 1:1 +vector-to-entry mapping into the field vector of each element. A +device driver is responsible of keeping track of allocated MSI-X +vectors in its internal data structure. + Argument nvec is an integer indicating the number of messages requested. -A return of zero indicates that the number of allocated vector is -successfully allocated. Otherwise, indicate resources not -available. -int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec) +A return of zero indicates that the number of MSI-X vectors is +successfully allocated. A return of greater than zero indicates +MSI-X vector shortage. Or a return of less than zero indicates +a failure. This failure may be a result of duplicate entries +specified in second argument, or a result of no available vector, +or a result of failing to initialize MSI-X table entries. -This API enables the software driver to inform the PCI subsystem -that it is willing to release a number of vectors back to the -MSI resource pool. Once invoked, the PCI subsystem disables each -MSI-X entry associated with each vector stored in the argument 2. -These vectors are no longer valid for the hardware device and -its software driver to use. +5.3.5 API pci_disable_msix -Argument dev points to the device (pci_dev) structure. -Argument vector is a pointer of integer type. The number of -elements is indicated in argument nvec. -Argument nvec is an integer indicating the number of messages -released. -A return of zero indicates that the number of allocated vectors -is successfully released. Otherwise, indicates a failure. +void pci_disable_msix(struct pci_dev *dev) -5.4 Hardware requirements for MSI support -MSI support requires support from both system hardware and +This API should always be used to undo the effect of pci_enable_msix() +when a device driver is unloading. Note that a device driver should +always call free_irq() on all MSI-X vectors it has done request_irq() +on before calling this API. Failure to do so results a BUG_ON() and +a device will be left with MSI-X enabled and leaks its vectors. + +5.3.6 MSI-X mode vs. legacy mode diagram + +The below diagram shows the events, which switches the interrupt +mode on the MSI-X capable device function between MSI-X mode and +PIN-IRQ assertion mode (legacy). + + ------------ pci_enable_msix(,,n) ------------------------ + | | <=============== | | + | MSI-X MODE | | PIN-IRQ ASSERTION MODE | + | | ===============> | | + ------------ pci_disable_msix ------------------------ + +Figure 2.0 MSI-X Mode vs. Legacy Mode + +In Figure 2.0, a device operates by default in legacy mode. A +successful MSI-X request (using pci_enable_msix()) switches a +device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector +stored in dev->irq will be saved by the PCI subsystem; however, +unlike MSI mode, the PCI subsystem will not replace dev->irq with +assigned MSI-X vector because the PCI subsystem already writes the 1:1 +vector-to-entry mapping into the field vector of each element +specified in second argument. + +To return back to its default mode, a device driver should always call +pci_disable_msix() to undo the effect of pci_enable_msix(). Note that +a device driver should always call free_irq() on all MSI-X vectors it +has done request_irq() on before calling pci_disable_msix(). Failure +to do so results a BUG_ON() and a device will be left with MSI-X +enabled and leaks its vectors. Otherwise, the PCI subsystem switches a +device function's interrupt mode from MSI-X mode to legacy mode and +marks all allocated MSI-X vectors as unused. + +Once being marked as unused, there is no guarantee that the PCI +subsystem will reserve these MSI-X vectors for a device. Depending on +the availability of current PCI vector resources and the number of +MSI/MSI-X requests from other drivers, these MSI-X vectors may be +re-assigned. + +For the case where the PCI subsystem re-assigned these MSI-X vectors +to other driver, a request to switching back to MSI-X mode may result +being assigned with another set of MSI-X vectors or a failure if no +more vectors are available. + +5.4 Handling function implementng both MSI and MSI-X capabilities + +For the case where a function implements both MSI and MSI-X +capabilities, the PCI subsystem enables a device to run either in MSI +mode or MSI-X mode but not both. A device driver determines whether it +wants MSI or MSI-X enabled on its hardware device. Once a device +driver requests for MSI, for example, it is prohibited to request for +MSI-X; in other words, a device driver is not permitted to ping-pong +between MSI mod MSI-X mode during a run-time. + +5.5 Hardware requirements for MSI/MSI-X support +MSI/MSI-X support requires support from both system hardware and individual hardware device functions. -5.4.1 System hardware support +5.5.1 System hardware support Since the target of MSI address is the local APIC CPU, enabling -MSI support in Linux kernel is dependent on whether existing +MSI/MSI-X support in Linux kernel is dependent on whether existing system hardware supports local APIC. Users should verify their system whether it runs when CONFIG_X86_LOCAL_APIC=y. In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; however, in UP environment, users must manually set CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting -CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and +CONFIG_PCI_MSI enables the VECTOR based scheme and the option for MSI-capable device drivers to selectively enable -MSI (using pci_enable_msi as described below). +MSI/MSI-X. -Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI -vector is allocated new during runtime and MSI support does not -depend on BIOS support. This key independency enables MSI support -on future IOxAPIC free platform. +Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X +vector is allocated new during runtime and MSI/MSI-X support does not +depend on BIOS support. This key independency enables MSI/MSI-X +support on future IOxAPIC free platform. -5.4.2 Device hardware support +5.5.2 Device hardware support The hardware device function supports MSI by indicating the MSI/MSI-X capability structure on its PCI capability list. By default, this capability structure will not be initialized by @@ -249,17 +430,19 @@ which may result in system hang. The software driver of specific MSI-capable hardware is responsible for whether calling pci_enable_msi or not. A return of zero indicates the kernel successfully initializes the MSI/MSI-X capability structure of the -device funtion. The device function is now running on MSI mode. +device funtion. The device function is now running on MSI/MSI-X mode. -5.5 How to tell whether MSI is enabled on device function +5.6 How to tell whether MSI/MSI-X is enabled on device function -At the driver level, a return of zero from pci_enable_msi(...) -indicates to the device driver that its device function is -initialized successfully and ready to run in MSI mode. +At the driver level, a return of zero from the function call of +pci_enable_msi()/pci_enable_msix() indicates to a device driver that +its device function is initialized successfully and ready to run in +MSI/MSI-X mode. At the user level, users can use command 'cat /proc/interrupts' -to display the vector allocated for the device and its interrupt -mode, as shown below. +to display the vector allocated for a device and its interrupt +MSI/MSI-X mode ("PCI MSI"/"PCI MSIX"). Below shows below MSI mode is +enabled on a SCSI Adaptec 39320D Ultra320. CPU0 CPU1 0: 324639 0 IO-APIC-edge timer