X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Fmemory-barriers.txt;h=58408dd023c77e0e0712d02811fc0238c5ee1742;hb=97bf2856c6014879bd04983a3e9dfcdac1e7fe85;hp=4710845dbac4fb663e35fcf0056499614a06e56b;hpb=16cf0ec7408f389279d413869e94c1a351392f97;p=linux-2.6.git diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 4710845db..58408dd02 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -212,7 +212,7 @@ There are some minimal guarantees that may be expected of a CPU: STORE *X = c, d = LOAD *X - (Loads and stores overlap if they are targetted at overlapping pieces of + (Loads and stores overlap if they are targeted at overlapping pieces of memory). And there are a number of things that _must_ or _must_not_ be assumed: @@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the CPU to restrict the order. Memory barriers are such interventions. They impose a perceived partial -ordering between the memory operations specified on either side of the barrier. -They request that the sequence of memory events generated appears to other -parts of the system as if the barrier is effective on that CPU. +ordering over the memory operations on either side of the barrier. + +Such enforcement is important because the CPUs and other devices in a system +can use a variety of tricks to improve performance - including reordering, +deferral and combination of memory operations; speculative loads; speculative +branch prediction and various types of caching. Memory barriers are used to +override or suppress these tricks, allowing the code to sanely control the +interaction of multiple CPUs and/or devices. VARIETIES OF MEMORY BARRIER @@ -282,7 +287,7 @@ Memory barriers come in four basic varieties: A write barrier is a partial ordering on stores only; it is not required to have any effect on loads. - A CPU can be viewed as as commiting a sequence of store operations to the + A CPU can be viewed as committing a sequence of store operations to the memory system as time progresses. All stores before a write barrier will occur in the sequence _before_ all the stores after the write barrier. @@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee: indirect effect will be the order in which the second CPU sees the effects of the first CPU's accesses occur, but see the next point: - (*) There is no guarantee that the a CPU will see the correct order of effects + (*) There is no guarantee that a CPU will see the correct order of effects from a second CPU's accesses, even _if_ the second CPU uses a memory barrier, unless the first CPU _also_ uses a matching memory barrier (see the subsection on "SMP Barrier Pairing"). @@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it isn't, and this behaviour can be observed on certain real CPUs (such as the DEC Alpha). -To deal with this, a data dependency barrier must be inserted between the -address load and the data load: +To deal with this, a data dependency barrier or better must be inserted +between the address load and the data load: CPU 1 CPU 2 =============== =============== @@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the variable B might be stored in an even-numbered cache line. Then, if the even-numbered bank of the reading CPU's cache is extremely busy while the odd-numbered bank is idle, one can see the new value of the pointer P (&B), -but the old value of the variable B (1). +but the old value of the variable B (2). Another example of where data dependency barriers might by required is where a @@ -597,7 +602,7 @@ Consider the following sequence of events: This sequence of events is committed to the memory coherence system in an order that the rest of the system might perceive as the unordered set of { STORE A, -STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E +STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E }: +-------+ : : @@ -665,7 +670,7 @@ effectively random order, despite the write barrier issued by CPU 1: In the above example, CPU 2 perceives that B is 7, despite the load of *C -(which would be B) coming after the the LOAD of C. +(which would be B) coming after the LOAD of C. If, however, a data dependency barrier were to be placed between the load of C and the load of *C (ie: B) on CPU 2: @@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1: : : -If, however, a read barrier were to be placed between the load of E and the +If, however, a read barrier were to be placed between the load of B and the load of A on CPU 2: CPU 1 CPU 2 @@ -1010,10 +1015,9 @@ CPU from reordering them. There are some more advanced barrier functions: (*) set_mb(var, value) - (*) set_wmb(var, value) - These assign the value to the variable and then insert at least a write - barrier after it, depending on the function. They aren't guaranteed to + This assigns the value to the variable and then inserts a full memory + barrier after it, depending on the function. It isn't guaranteed to insert anything more than a compiler barrier in a UP compilation. @@ -1461,9 +1465,8 @@ instruction itself is complete. On a UP system - where this wouldn't be a problem - the smp_mb() is just a compiler barrier, thus making sure the compiler emits the instructions in the -right order without actually intervening in the CPU. Since there there's only -one CPU, that CPU's dependency ordering logic will take care of everything -else. +right order without actually intervening in the CPU. Since there's only one +CPU, that CPU's dependency ordering logic will take care of everything else. ATOMIC OPERATIONS @@ -1640,9 +1643,9 @@ functions: The PCI bus, amongst others, defines an I/O space concept - which on such CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O - space. However, it may also mapped as a virtual I/O space in the CPU's - memory map, particularly on those CPUs that don't support alternate - I/O spaces. + space. However, it may also be mapped as a virtual I/O space in the CPU's + memory map, particularly on those CPUs that don't support alternate I/O + spaces. Accesses to this space may be fully synchronous (as on i386), but intermediary bridges (such as the PCI host bridge) may not fully honour @@ -1895,7 +1898,7 @@ queue before processing any further requests: smp_wmb(); - p = &b; q = p; + p = &v; q = p; @@ -1912,7 +1915,7 @@ Whilst most CPUs do imply a data dependency barrier on the read when a memory access depends on a read, not all do, so it may not be relied on. Other CPUs may also have split caches, but must coordinate between the various -cachelets for normal memory accesss. The semantics of the Alpha removes the +cachelets for normal memory accesses. The semantics of the Alpha removes the need for coordination in absence of memory barriers.