X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Fcpusets.txt;fp=Documentation%2Fcpusets.txt;h=30c41459953c3ca04e6dd93ddf8702ced7cb89c4;hb=64ba3f394c830ec48a1c31b53dcae312c56f1604;hp=76b44290c1546c50a4e15527c18a19e626249f35;hpb=be1e6109ac94a859551f8e1774eb9a8469fe055c;p=linux-2.6.git diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index 76b44290c..30c414599 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt @@ -18,8 +18,7 @@ CONTENTS: 1.4 What are exclusive cpusets ? 1.5 What does notify_on_release do ? 1.6 What is memory_pressure ? - 1.7 What is memory spread ? - 1.8 How do I use cpusets ? + 1.7 How do I use cpusets ? 2. Usage Examples and Syntax 2.1 Basic Usage 2.2 Adding/removing cpus @@ -217,12 +216,6 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs) to represent the cpuset hierarchy provides for a familiar permission and name space for cpusets, with a minimum of additional kernel code. -The cpus file in the root (top_cpuset) cpuset is read-only. -It automatically tracks the value of cpu_online_map, using a CPU -hotplug notifier. If and when memory nodes can be hotplugged, -we expect to make the mems file in the root cpuset read-only -as well, and have it track the value of node_online_map. - 1.4 What are exclusive cpusets ? -------------------------------- @@ -324,78 +317,7 @@ the tasks in the cpuset, in units of reclaims attempted per second, times 1000. -1.7 What is memory spread ? ---------------------------- -There are two boolean flag files per cpuset that control where the -kernel allocates pages for the file system buffers and related in -kernel data structures. They are called 'memory_spread_page' and -'memory_spread_slab'. - -If the per-cpuset boolean flag file 'memory_spread_page' is set, then -the kernel will spread the file system buffers (page cache) evenly -over all the nodes that the faulting task is allowed to use, instead -of preferring to put those pages on the node where the task is running. - -If the per-cpuset boolean flag file 'memory_spread_slab' is set, -then the kernel will spread some file system related slab caches, -such as for inodes and dentries evenly over all the nodes that the -faulting task is allowed to use, instead of preferring to put those -pages on the node where the task is running. - -The setting of these flags does not affect anonymous data segment or -stack segment pages of a task. - -By default, both kinds of memory spreading are off, and memory -pages are allocated on the node local to where the task is running, -except perhaps as modified by the tasks NUMA mempolicy or cpuset -configuration, so long as sufficient free memory pages are available. - -When new cpusets are created, they inherit the memory spread settings -of their parent. - -Setting memory spreading causes allocations for the affected page -or slab caches to ignore the tasks NUMA mempolicy and be spread -instead. Tasks using mbind() or set_mempolicy() calls to set NUMA -mempolicies will not notice any change in these calls as a result of -their containing tasks memory spread settings. If memory spreading -is turned off, then the currently specified NUMA mempolicy once again -applies to memory page allocations. - -Both 'memory_spread_page' and 'memory_spread_slab' are boolean flag -files. By default they contain "0", meaning that the feature is off -for that cpuset. If a "1" is written to that file, then that turns -the named feature on. - -The implementation is simple. - -Setting the flag 'memory_spread_page' turns on a per-process flag -PF_SPREAD_PAGE for each task that is in that cpuset or subsequently -joins that cpuset. The page allocation calls for the page cache -is modified to perform an inline check for this PF_SPREAD_PAGE task -flag, and if set, a call to a new routine cpuset_mem_spread_node() -returns the node to prefer for the allocation. - -Similarly, setting 'memory_spread_cache' turns on the flag -PF_SPREAD_SLAB, and appropriately marked slab caches will allocate -pages from the node returned by cpuset_mem_spread_node(). - -The cpuset_mem_spread_node() routine is also simple. It uses the -value of a per-task rotor cpuset_mem_spread_rotor to select the next -node in the current tasks mems_allowed to prefer for the allocation. - -This memory placement policy is also known (in other contexts) as -round-robin or interleave. - -This policy can provide substantial improvements for jobs that need -to place thread local data on the corresponding node, but that need -to access large file system data sets that need to be spread across -the several nodes in the jobs cpuset in order to fit. Without this -policy, especially for jobs that might have one thread reading in the -data set, the memory allocation across the nodes in the jobs cpuset -can become very uneven. - - -1.8 How do I use cpusets ? +1.7 How do I use cpusets ? -------------------------- In order to minimize the impact of cpusets on critical kernel