X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Fsysctl%2Fvm.txt;fp=Documentation%2Fsysctl%2Fvm.txt;h=a46c10fcddfcf17969d42fb472af306d9e9eab6f;hb=43bc926fffd92024b46cafaf7350d669ba9ca884;hp=2f1aae32a5d9dfc2c0f71bb0400ce4a714d49c07;hpb=cee37fe97739d85991964371c1f3a745c00dd236;p=linux-2.6.git diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 2f1aae32a..a46c10fcd 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -26,12 +26,15 @@ Currently, these files are in /proc/sys/vm: - min_free_kbytes - laptop_mode - block_dump +- drop-caches +- zone_reclaim_mode +- zone_reclaim_interval ============================================================== dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, -block_dump, swap_token_timeout: +block_dump, swap_token_timeout, drop-caches: See Documentation/filesystems/proc.txt @@ -102,3 +105,76 @@ This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a pages_min value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based proportionally on its size. + +============================================================== + +percpu_pagelist_fraction + +This is the fraction of pages at most (high mark pcp->high) in each zone that +are allocated for each per cpu page list. The min value for this is 8. It +means that we don't allow more than 1/8th of pages in each zone to be +allocated in any single per_cpu_pagelist. This entry only changes the value +of hot per cpu pagelists. User can specify a number like 100 to allocate +1/100th of each zone to each per cpu page list. + +The batch value of each per cpu pagelist is also updated as a result. It is +set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) + +The initial value is zero. Kernel does not use this value at boot time to set +the high water marks for each per cpu page list. + +=============================================================== + +zone_reclaim_mode: + +Zone_reclaim_mode allows to set more or less agressive approaches to +reclaim memory when a zone runs out of memory. If it is set to zero then no +zone reclaim occurs. Allocations will be satisfied from other zones / nodes +in the system. + +This is value ORed together of + +1 = Zone reclaim on +2 = Zone reclaim writes dirty pages out +4 = Zone reclaim swaps pages +8 = Also do a global slab reclaim pass + +zone_reclaim_mode is set during bootup to 1 if it is determined that pages +from remote zones will cause a measurable performance reduction. The +page allocator will then reclaim easily reusable pages (those page +cache pages that are currently not used) before allocating off node pages. + +It may be beneficial to switch off zone reclaim if the system is +used for a file server and all of memory should be used for caching files +from disk. In that case the caching effect is more important than +data locality. + +Allowing zone reclaim to write out pages stops processes that are +writing large amounts of data from dirtying pages on other nodes. Zone +reclaim will write out dirty pages if a zone fills up and so effectively +throttle the process. This may decrease the performance of a single process +since it cannot use all of system memory to buffer the outgoing writes +anymore but it preserve the memory on other nodes so that the performance +of other processes running on other nodes will not be affected. + +Allowing regular swap effectively restricts allocations to the local +node unless explicitly overridden by memory policies or cpuset +configurations. + +It may be advisable to allow slab reclaim if the system makes heavy +use of files and builds up large slab caches. However, the slab +shrink operation is global, may take a long time and free slabs +in all nodes of the system. + +================================================================ + +zone_reclaim_interval: + +The time allowed for off node allocations after zone reclaim +has failed to reclaim enough pages to allow a local allocation. + +Time is set in seconds and set by default to 30 seconds. + +Reduce the interval if undesired off node allocations occur. However, too +frequent scans will have a negative impact onoff node allocation performance. +