Merge to Fedora kernel-2.6.18-1.2224_FC5 patched with stable patch-2.6.18.1-vs2.0...

[linux-2.6.git] / Documentation / memory-barriers.txt
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

index 4710845..46b9b38 100644 (file)
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
  CPU to restrict the order.
  
  Memory barriers are such interventions.  They impose a perceived partial
  CPU to restrict the order.
  
  Memory barriers are such interventions.  They impose a perceived partial
-ordering between the memory operations specified on either side of the barrier.
-They request that the sequence of memory events generated appears to other
-parts of the system as if the barrier is effective on that CPU.
+ordering over the memory operations on either side of the barrier.
+
+Such enforcement is important because the CPUs and other devices in a system
+can use a variety of tricks to improve performance - including reordering,
+deferral and combination of memory operations; speculative loads; speculative
+branch prediction and various types of caching.  Memory barriers are used to
+override or suppress these tricks, allowing the code to sanely control the
+interaction of multiple CPUs and/or devices.
  
  
  VARIETIES OF MEMORY BARRIER
  
  
  VARIETIES OF MEMORY BARRIER
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
       A write barrier is a partial ordering on stores only; it is not required
       to have any effect on loads.
  
       A write barrier is a partial ordering on stores only; it is not required
       to have any effect on loads.
  
-     A CPU can be viewed as as commiting a sequence of store operations to the
+     A CPU can be viewed as committing a sequence of store operations to the
       memory system as time progresses.  All stores before a write barrier will
       occur in the sequence _before_ all the stores after the write barrier.
  
       memory system as time progresses.  All stores before a write barrier will
       occur in the sequence _before_ all the stores after the write barrier.
  
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
       indirect effect will be the order in which the second CPU sees the effects
       of the first CPU's accesses occur, but see the next point:
  
       indirect effect will be the order in which the second CPU sees the effects
       of the first CPU's accesses occur, but see the next point:
  
- (*) There is no guarantee that the a CPU will see the correct order of effects
+ (*) There is no guarantee that a CPU will see the correct order of effects
       from a second CPU's accesses, even _if_ the second CPU uses a memory
       barrier, unless the first CPU _also_ uses a matching memory barrier (see
       the subsection on "SMP Barrier Pairing").
       from a second CPU's accesses, even _if_ the second CPU uses a memory
       barrier, unless the first CPU _also_ uses a matching memory barrier (see
       the subsection on "SMP Barrier Pairing").
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
  isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
  Alpha).
  
  isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
  Alpha).
  
-To deal with this, a data dependency barrier must be inserted between the
-address load and the data load:
+To deal with this, a data dependency barrier or better must be inserted
+between the address load and the data load:
  
         CPU 1           CPU 2
         =============== ===============
  
         CPU 1           CPU 2
         =============== ===============
@@ -484,7 +489,7 @@ lines.  The pointer P might be stored in an odd-numbered cache line, and the
  variable B might be stored in an even-numbered cache line.  Then, if the
  even-numbered bank of the reading CPU's cache is extremely busy while the
  odd-numbered bank is idle, one can see the new value of the pointer P (&B),
  variable B might be stored in an even-numbered cache line.  Then, if the
  even-numbered bank of the reading CPU's cache is extremely busy while the
  odd-numbered bank is idle, one can see the new value of the pointer P (&B),
-but the old value of the variable B (1).
+but the old value of the variable B (2).
  
  
  Another example of where data dependency barriers might by required is where a
  
  
  Another example of where data dependency barriers might by required is where a
@@ -597,7 +602,7 @@ Consider the following sequence of events:
  
  This sequence of events is committed to the memory coherence system in an order
  that the rest of the system might perceive as the unordered set of { STORE A,
  
  This sequence of events is committed to the memory coherence system in an order
  that the rest of the system might perceive as the unordered set of { STORE A,
-STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E
+STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
  }:
  
         +-------+       :      :
  }:
  
         +-------+       :      :
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
                                                 :       :
  
  
                                                 :       :
  
  
-If, however, a read barrier were to be placed between the load of E and the
+If, however, a read barrier were to be placed between the load of B and the
  load of A on CPU 2:
  
         CPU 1                   CPU 2
  load of A on CPU 2:
  
         CPU 1                   CPU 2
@@ -1010,10 +1015,9 @@ CPU from reordering them.
  There are some more advanced barrier functions:
  
   (*) set_mb(var, value)
  There are some more advanced barrier functions:
  
   (*) set_mb(var, value)
- (*) set_wmb(var, value)
  
  
-     These assign the value to the variable and then insert at least a write
-     barrier after it, depending on the function.  They aren't guaranteed to
+     This assigns the value to the variable and then inserts at least a write
+     barrier after it, depending on the function.  It isn't guaranteed to
       insert anything more than a compiler barrier in a UP compilation.
  
  
       insert anything more than a compiler barrier in a UP compilation.
  
  
@@ -1461,9 +1465,8 @@ instruction itself is complete.
  
  On a UP system - where this wouldn't be a problem - the smp_mb() is just a
  compiler barrier, thus making sure the compiler emits the instructions in the
  
  On a UP system - where this wouldn't be a problem - the smp_mb() is just a
  compiler barrier, thus making sure the compiler emits the instructions in the
-right order without actually intervening in the CPU.  Since there there's only
-one CPU, that CPU's dependency ordering logic will take care of everything
-else.
+right order without actually intervening in the CPU.  Since there's only one
+CPU, that CPU's dependency ordering logic will take care of everything else.
  
  
  ATOMIC OPERATIONS
  
  
  ATOMIC OPERATIONS
@@ -1640,9 +1643,9 @@ functions:
  
       The PCI bus, amongst others, defines an I/O space concept - which on such
       CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
  
       The PCI bus, amongst others, defines an I/O space concept - which on such
       CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
-     space.  However, it may also mapped as a virtual I/O space in the CPU's
-     memory map, particularly on those CPUs that don't support alternate
-     I/O spaces.
+     space.  However, it may also be mapped as a virtual I/O space in the CPU's
+     memory map, particularly on those CPUs that don't support alternate I/O
+     spaces.
  
       Accesses to this space may be fully synchronous (as on i386), but
       intermediary bridges (such as the PCI host bridge) may not fully honour
  
       Accesses to this space may be fully synchronous (as on i386), but
       intermediary bridges (such as the PCI host bridge) may not fully honour