X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Fpowerpc%2Feeh-pci-error-recovery.txt;h=3764dd4b12cbac8f1d5a224faca4f2f5894ed889;hb=43bc926fffd92024b46cafaf7350d669ba9ca884;hp=2bfe71beec5b1cacad0464298d8acebacc4c7f27;hpb=cee37fe97739d85991964371c1f3a745c00dd236;p=linux-2.6.git diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt b/Documentation/powerpc/eeh-pci-error-recovery.txt index 2bfe71bee..3764dd4b1 100644 --- a/Documentation/powerpc/eeh-pci-error-recovery.txt +++ b/Documentation/powerpc/eeh-pci-error-recovery.txt @@ -115,13 +115,13 @@ Current PPC64 Linux EEH Implementation At this time, a generic EEH recovery mechanism has been implemented, so that individual device drivers do not need to be modified to support EEH recovery. This generic mechanism piggy-backs on the PCI hotplug -infrastructure, and percolates events up through the hotplug/udev +infrastructure, and percolates events up through the userspace/udev infrastructure. Followiing is a detailed description of how this is accomplished. EEH must be enabled in the PHB's very early during the boot process, and if a PCI slot is hot-plugged. The former is performed by -eeh_init() in arch/ppc64/kernel/eeh.c, and the later by +eeh_init() in arch/powerpc/platforms/pseries/eeh.c, and the later by drivers/pci/hotplug/pSeries_pci.c calling in to the eeh.c code. EEH must be enabled before a PCI scan of the device can proceed. Current Power5 hardware will not work unless EEH is enabled; @@ -133,8 +133,8 @@ error. Given an arbitrary address, the routine pci_get_device_by_addr() will find the pci device associated with that address (if any). -The default include/asm-ppc64/io.h macros readb(), inb(), insb(), -etc. include a check to see if the the i/o read returned all-0xff's. +The default include/asm-powerpc/io.h macros readb(), inb(), insb(), +etc. include a check to see if the i/o read returned all-0xff's. If so, these make a call to eeh_dn_check_failure(), which in turn asks the firmware if the all-ff's value is the sign of a true EEH error. If it is not, processing continues as normal. The grand @@ -143,11 +143,12 @@ seen in /proc/ppc64/eeh (subject to change). Normally, almost all of these occur during boot, when the PCI bus is scanned, where a large number of 0xff reads are part of the bus scan procedure. -If a frozen slot is detected, code in arch/ppc64/kernel/eeh.c will -print a stack trace to syslog (/var/log/messages). This stack trace -has proven to be very useful to device-driver authors for finding -out at what point the EEH error was detected, as the error itself -usually occurs slightly beforehand. +If a frozen slot is detected, code in +arch/powerpc/platforms/pseries/eeh.c will print a stack trace to +syslog (/var/log/messages). This stack trace has proven to be very +useful to device-driver authors for finding out at what point the EEH +error was detected, as the error itself usually occurs slightly +beforehand. Next, it uses the Linux kernel notifier chain/work queue mechanism to allow any interested parties to find out about the failure. Device @@ -172,7 +173,7 @@ A handler for the EEH notifier_block events is implemented in drivers/pci/hotplug/pSeries_pci.c, called handle_eeh_events(). It saves the device BAR's and then calls rpaphp_unconfig_pci_adapter(). This last call causes the device driver for the card to be stopped, -which causes hotplug events to go out to user space. This triggers +which causes uevents to go out to user space. This triggers user-space scripts that might issue commands such as "ifdown eth0" for ethernet cards, and so on. This handler then sleeps for 5 seconds, hoping to give the user-space scripts enough time to complete. @@ -258,29 +259,30 @@ rpa_php_unconfig_pci_adapter() { // in rpaphp_pci.c calls pci_destroy_dev (struct pci_dev *) { calls - device_unregister (&dev->dev) { // in /drivers/base/core.c + device_unregister (&dev->dev) { // in /drivers/base/core.c calls - device_del(struct device * dev) { // in /drivers/base/core.c + device_del(struct device * dev) { // in /drivers/base/core.c calls - kobject_del() { //in /libs/kobject.c + kobject_del() { //in /libs/kobject.c calls - kobject_hotplug() { // in /libs/kobject.c + kobject_uevent() { // in /libs/kobject.c calls - kset_hotplug() { // in /lib/kobject.c + kset_uevent() { // in /lib/kobject.c calls - kset->hotplug_ops->hotplug() which is really just + kset->uevent_ops->uevent() // which is really just a call to - dev_hotplug() { // in /drivers/base/core.c + dev_uevent() { // in /drivers/base/core.c calls - dev->bus->hotplug() which is really just a call to - pci_hotplug () { // in drivers/pci/hotplug.c + dev->bus->uevent() which is really just a call to + pci_uevent () { // in drivers/pci/hotplug.c which prints device name, etc.... } } - then kset_hotplug() calls - call_usermodehelper () with - argv[0]=hotplug_path[] which is "/sbin/hotplug" - --> event to userspace, + then kobject_uevent() sends a netlink uevent to userspace + --> userspace uevent + (during early boot, nobody listens to netlink events and + kobject_uevent() executes uevent_helper[], which runs the + event process /sbin/hotplug) } } kobject_del() then calls sysfs_remove_dir(), which would