Device Power Management Device power management encompasses two areas - the ability to save state and transition a device to a low-power state when the system is entering a low-power state; and the ability to transition a device to a low-power state while the system is running (and independently of any other power management activity). Methods The methods to suspend and resume devices reside in struct bus_type: struct bus_type { ... int (*suspend)(struct device * dev, u32 state); int (*resume)(struct device * dev); }; Each bus driver is responsible implementing these methods, translating the call into a bus-specific request and forwarding the call to the bus-specific drivers. For example, PCI drivers implement suspend() and resume() methods in struct pci_driver. The PCI core is simply responsible for translating the pointers to PCI-specific ones and calling the low-level driver. This is done to a) ease transition to the new power management methods and leverage the existing PM code in various bus drivers; b) allow buses to implement generic and default PM routines for devices, and c) make the flow of execution obvious to the reader. System Power Management When the system enters a low-power state, the device tree is walked in a depth-first fashion to transition each device into a low-power state. The ordering of the device tree is guaranteed by the order in which devices get registered - children are never registered before their ancestors, and devices are placed at the back of the list when registered. By walking the list in reverse order, we are guaranteed to suspend devices in the proper order. Devices are suspended once with interrupts enabled. Drivers are expected to stop I/O transactions, save device state, and place the device into a low-power state. Drivers may sleep, allocate memory, etc. at will. Some devices are broken and will inevitably have problems powering down or disabling themselves with interrupts enabled. For these special cases, they may return -EAGAIN. This will put the device on a list to be taken care of later. When interrupts are disabled, before we enter the low-power state, their drivers are called again to put their device to sleep. On resume, the devices that returned -EAGAIN will be called to power themselves back on with interrupts disabled. Once interrupts have been re-enabled, the rest of the drivers will be called to resume their devices. On resume, a driver is responsible for powering back on each device, restoring state, and re-enabling I/O transactions for that device. System devices follow a slightly different API, which can be found in include/linux/sysdev.h drivers/base/sys.c System devices will only be suspended with interrupts disabled, and after all other devices have been suspended. On resume, they will be resumed before any other devices, and also with interrupts disabled. Runtime Power Management Many devices are able to dynamically power down while the system is still running. This feature is useful for devices that are not being used, and can offer significant power savings on a running system. In each device's directory, there is a 'power' directory, which contains at least a 'state' file. Reading from this file displays what power state the device is currently in. Writing to this file initiates a transition to the specified power state, which must be a decimal in the range 1-3, inclusive; or 0 for 'On'. The PM core will call the ->suspend() method in the bus_type object that the device belongs to if the specified state is not 0, or ->resume() if it is. Nothing will happen if the specified state is the same state the device is currently in. If the device is already in a low-power state, and the specified state is another, but different, low-power state, the ->resume() method will first be called to power the device back on, then ->suspend() will be called again with the new state. The driver is responsible for saving the working state of the device and putting it into the low-power state specified. If this was successful, it returns 0, and the device's power_state field is updated. The driver must take care to know whether or not it is able to properly resume the device, including all step of reinitialization necessary. (This is the hardest part, and the one most protected by NDA'd documents). The driver must also take care not to suspend a device that is currently in use. It is their responsibility to provide their own exclusion mechanisms. The runtime power transition happens with interrupts enabled. If a device cannot support being powered down with interrupts, it may return -EAGAIN (as it would during a system power management transition), but it will _not_ be called again, and the transaction will fail. There is currently no way to know what states a device or driver supports a priori. This will change in the future. Driver Detach Power Management The kernel now supports the ability to place a device in a low-power state when it is detached from its driver, which happens when its module is removed. Each device contains a 'detach_state' file in its sysfs directory which can be used to control this state. Reading from this file displays what the current detach state is set to. This is 0 (On) by default. A user may write a positive integer value to this file in the range of 1-4 inclusive. A value of 1-3 will indicate the device should be placed in that low-power state, which will cause ->suspend() to be called for that device. A value of 4 indicates that the device should be shutdown, so ->shutdown() will be called for that device. The driver is responsible for reinitializing the device when the module is re-inserted during it's ->probe() (or equivalent) method. The driver core will not call any extra functions when binding the device to the driver.