X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Ffilesystems%2Fproc.txt;h=72af5de1effb44a93d907ed4e0117ec403a8257e;hb=97bf2856c6014879bd04983a3e9dfcdac1e7fe85;hp=bc552015b66b047be5247eb203d17ed69aebe143;hpb=9213980e6a70d8473e0ffd4b39ab5b6caaba9ff5;p=linux-2.6.git diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index bc552015b..72af5de1e 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -39,6 +39,8 @@ Table of Contents 2.9 Appletalk 2.10 IPX 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem + 2.12 /proc//oom_adj - Adjust the oom-killer score + 2.13 /proc//oom_score - Display current oom-killer score ------------------------------------------------------------------------------ Preface @@ -55,7 +57,7 @@ This work is based on the 2.2.* kernel version and the upcoming 2.4.*. I'm afraid it's still far from complete, but we hope it will be useful. As far as we know, it is the first 'all-in-one' document about the /proc file system. It is focused on the Intel x86 hardware, so if you are looking for PPC, ARM, -SPARC, APX, etc., features, you probably won't find what you are looking for. +SPARC, AXP, etc., features, you probably won't find what you are looking for. It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But additions and patches are welcome and will be added to this document if you mail them to Bodo. @@ -121,7 +123,7 @@ Table 1-1: Process specific entries in /proc .............................................................................. File Content cmdline Command line arguments - cpu Current and last cpu in wich it was executed (2.4)(smp) + cpu Current and last cpu in which it was executed (2.4)(smp) cwd Link to the current working directory environ Values of environment variables exe Link to the executable of this process @@ -133,6 +135,7 @@ Table 1-1: Process specific entries in /proc statm Process memory status information status Process status in human readable form wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan + smaps Extension based on maps, presenting the rss size for each mapped file .............................................................................. For example, to get the status information of a process, all you have to do is @@ -169,16 +172,18 @@ information. The statm file contains more detailed information about the process memory usage. Its seven fields are explained in Table 1-2. -Table 1-2: Contents of the statm files +Table 1-2: Contents of the statm files (as of 2.6.8-rc3) .............................................................................. - File Content - size total program size - resident size of memory portions - shared number of pages that are shared - trs number of pages that are 'code' - drs number of pages of data/stack - lrs number of pages of library - dt number of dirty pages + Field Content + size total program size (pages) (same as VmSize in status) + resident size of memory portions (pages) (same as VmRSS in status) + shared number of pages that are shared (i.e. backed by a file) + trs number of pages that are 'code' (not including libs; broken, + includes data segment) + lrs number of pages of library (always 0 on 2.6) + drs number of pages of data/stack (including libs; broken, + includes library text) + dt number of dirty pages (always 0 on 2.6) .............................................................................. 1.2 Kernel data @@ -201,7 +206,7 @@ Table 1-3: Kernel info in /proc devices Available devices (block and character) dma Used DMS channels filesystems Supported filesystems - driver Various drivers grouped here, currently rtc (2.4) + driver Various drivers grouped here, currently rtc (2.4) execdomains Execdomains, related to security (2.4) fb Frame Buffer devices (2.4) fs File system parameters, currently nfs/exports (2.4) @@ -306,13 +311,13 @@ is the same by default: > cat /proc/irq/0/smp_affinity ffffffff -It's a bitmask, in wich you can specify wich CPUs can handle the IRQ, you can +It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can set it by doing: > echo 1 > /proc/irq/prof_cpu_mask This means that only the first CPU will handle the IRQ, but you can also echo 5 -wich means that only the first and fourth CPU can handle the IRQ. +which means that only the first and fourth CPU can handle the IRQ. The way IRQs are routed is handled by the IO-APIC, and it's Round Robin between all the CPUs which are allowed to handle it. As usual the kernel has @@ -348,22 +353,6 @@ available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE available in ZONE_NORMAL, etc... - -1.3 IDE devices in /proc/ide ----------------------------- - -The subdirectory /proc/ide contains information about all IDE devices of which -the kernel is aware. There is one subdirectory for each IDE controller, the -file drivers and a link for each IDE device, pointing to the device directory -in the controller specific subtree. - -The file drivers contains general information about the drivers used for the -IDE devices: - - > cat /proc/ide/drivers - ide-cdrom version 4.53 - ide-disk version 1.08 - .............................................................................. meminfo: @@ -392,9 +381,9 @@ Dirty: 968 kB Writeback: 0 kB Mapped: 280372 kB Slab: 684068 kB -Committed_AS: 1576424 kB +CommitLimit: 7669796 kB +Committed_AS: 100056 kB PageTables: 24448 kB -ReverseMaps: 1080904 VmallocTotal: 112216 kB VmallocUsed: 428 kB VmallocChunk: 111088 kB @@ -421,7 +410,7 @@ VmallocChunk: 111088 kB this memory, making it slower to access than lowmem. LowTotal: LowFree: Lowmem is memory which can be used for everything that - highmem can be used for, but it is also availble for the + highmem can be used for, but it is also available for the kernel's use for its own data structures. Among many other things, it is where everything from the Slab is allocated. Bad things happen when you're out of lowmem. @@ -431,27 +420,55 @@ VmallocChunk: 111088 kB Dirty: Memory which is waiting to get written back to the disk Writeback: Memory which is actively being written back to the disk Mapped: files which have been mmaped, such as libraries - Slab: in-kernel data structures cache -Committed_AS: An estimate of how much RAM you would need to make a - 99.99% guarantee that there never is OOM (out of memory) - for this workload. Normally the kernel will overcommit - memory. That means, say you do a 1GB malloc, nothing - happens, really. Only when you start USING that malloc - memory you will get real memory on demand, and just as - much as you use. So you sort of take a mortgage and hope - the bank doesn't go bust. Other cases might include when - you mmap a file that's shared only when you write to it - and you get a private copy of that data. While it normally - is shared between processes. The Committed_AS is a - guesstimate of how much RAM/swap you would need - worst-case. + Slab: in-kernel data structures cache + CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), + this is the total amount of memory currently available to + be allocated on the system. This limit is only adhered to + if strict overcommit accounting is enabled (mode 2 in + 'vm.overcommit_memory'). + The CommitLimit is calculated with the following formula: + CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap + For example, on a system with 1G of physical RAM and 7G + of swap with a `vm.overcommit_ratio` of 30 it would + yield a CommitLimit of 7.3G. + For more details, see the memory overcommit documentation + in vm/overcommit-accounting. +Committed_AS: The amount of memory presently allocated on the system. + The committed memory is a sum of all of the memory which + has been allocated by processes, even if it has not been + "used" by them as of yet. A process which malloc()'s 1G + of memory, but only touches 300M of it will only show up + as using 300M of memory even if it has the address space + allocated for the entire 1G. This 1G is memory which has + been "committed" to by the VM and can be used at any time + by the allocating application. With strict overcommit + enabled on the system (mode 2 in 'vm.overcommit_memory'), + allocations which would exceed the CommitLimit (detailed + above) will not be permitted. This is useful if one needs + to guarantee that processes will not fail due to lack of + memory once that memory has been successfully allocated. PageTables: amount of memory dedicated to the lowest level of page tables. - ReverseMaps: number of reverse mappings performed VmallocTotal: total size of vmalloc memory area VmallocUsed: amount of vmalloc area which is used VmallocChunk: largest contigious block of vmalloc area which is free + +1.3 IDE devices in /proc/ide +---------------------------- + +The subdirectory /proc/ide contains information about all IDE devices of which +the kernel is aware. There is one subdirectory for each IDE controller, the +file drivers and a link for each IDE device, pointing to the device directory +in the controller specific subtree. + +The file drivers contains general information about the drivers used for the +IDE devices: + + > cat /proc/ide/drivers + ide-cdrom version 4.53 + ide-disk version 1.08 + More detailed information can be found in the controller specific subdirectories. These are named ide0, ide1 and so on. Each of these directories contains the files shown in table 1-4. @@ -895,16 +912,6 @@ nr_free_inodes Represents the number of free inodes. Ie. The number of inuse inodes is (nr_inodes - nr_free_inodes). -super-nr and super-max ----------------------- - -Again, super block structures are allocated by the kernel, but not freed. The -file super-max contains the maximum number of super block handlers, where -super-nr shows the number of currently allocated ones. - -Every mounted file system needs a super block, so if you plan to mount lots of -file systems, you may want to increase these numbers. - aio-nr and aio-max-nr --------------------- @@ -1109,12 +1116,45 @@ modprobe The location where the modprobe binary is located. The kernel uses this program to load modules on demand. +unknown_nmi_panic +----------------- + +The value in this file affects behavior of handling NMI. When the value is +non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel +debugging information is displayed on console. + +NMI switch that most IA32 servers have fires unknown NMI up, for example. +If a system hangs up, try pressing the NMI switch. + +nmi_watchdog +------------ + +Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero +the NMI watchdog is enabled and will continuously test all online cpus to +determine whether or not they are still functioning properly. + +Because the NMI watchdog shares registers with oprofile, by disabling the NMI +watchdog, oprofile may have more registers to utilize. + + 2.4 /proc/sys/vm - The virtual memory subsystem ----------------------------------------------- The files in this directory can be used to tune the operation of the virtual memory (VM) subsystem of the Linux kernel. +vfs_cache_pressure +------------------ + +Controls the tendency of the kernel to reclaim the memory which is used for +caching of directory and inode objects. + +At the default value of vfs_cache_pressure=100 the kernel will attempt to +reclaim dentries and inodes at a "fair" rate with respect to pagecache and +swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer +to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 +causes the kernel to prefer to reclaim dentries and inodes. + dirty_background_ratio ---------------------- @@ -1145,6 +1185,12 @@ for writeout by the pdflush daemons. It is expressed in 100'ths of a second. Data which has been dirty in-memory for longer than this interval will be written out next time a pdflush daemon wakes up. +legacy_va_layout +---------------- + +If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel +will use the legacy (2.4) layout for all processes. + lower_zone_protection --------------------- @@ -1174,9 +1220,9 @@ applications are using mlock(), or if you are running with no swap then you probably should increase the lower_zone_protection setting. The units of this tunable are fairly vague. It is approximately equal -to "megabytes". So setting lower_zone_protection=100 will protect around 100 +to "megabytes," so setting lower_zone_protection=100 will protect around 100 megabytes of the lowmem zone from user allocations. It will also make -those 100 megabytes unavaliable for use by applications and by +those 100 megabytes unavailable for use by applications and by pagecache, so there is a cost. The effects of this tunable may be observed by monitoring @@ -1201,16 +1247,38 @@ swap-intensive. overcommit_memory ----------------- -This file contains one value. The following algorithm is used to decide if -there's enough memory: if the value of overcommit_memory is positive, then -there's always enough memory. This is a useful feature, since programs often -malloc() huge amounts of memory 'just in case', while they only use a small -part of it. Leaving this value at 0 will lead to the failure of such a huge -malloc(), when in fact the system has enough memory for the program to run. +Controls overcommit of system memory, possibly allowing processes +to allocate (but not use) more memory than is actually available. + + +0 - Heuristic overcommit handling. Obvious overcommits of + address space are refused. Used for a typical system. It + ensures a seriously wild allocation fails while allowing + overcommit to reduce swap usage. root is allowed to + allocate slightly more memory in this mode. This is the + default. + +1 - Always overcommit. Appropriate for some scientific + applications. + +2 - Don't overcommit. The total address space commit + for the system is not permitted to exceed swap plus a + configurable percentage (default is 50) of physical RAM. + Depending on the percentage you use, in most situations + this means a process will not be killed while attempting + to use already-allocated memory but will receive errors + on memory allocation as appropriate. -On the other hand, enabling this feature can cause you to run out of memory -and thrash the system to death, so large and/or important servers will want to -set this value to 0. +overcommit_ratio +---------------- + +Percentage of physical memory size to include in overcommit calculations +(see above.) + +Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100) + + swapspace = total size of all swap areas + physmem = size of physical memory in system nr_hugepages and hugetlb_shm_group ---------------------------------- @@ -1220,6 +1288,43 @@ nr_hugepages configures number of hugetlb page reserved for the system. hugetlb_shm_group contains group id that is allowed to create SysV shared memory segment using hugetlb page. +laptop_mode +----------- + +laptop_mode is a knob that controls "laptop mode". All the things that are +controlled by this knob are discussed in Documentation/laptop-mode.txt. + +block_dump +---------- + +block_dump enables block I/O debugging when set to a nonzero value. More +information on block I/O debugging is in Documentation/laptop-mode.txt. + +swap_token_timeout +------------------ + +This file contains valid hold time of swap out protection token. The Linux +VM has token based thrashing control mechanism and uses the token to prevent +unnecessary page faults in thrashing situation. The unit of the value is +second. The value would be useful to tune thrashing behavior. + +drop_caches +----------- + +Writing to this will cause the kernel to drop clean caches, dentries and +inodes from memory, causing that memory to become free. + +To free pagecache: + echo 1 > /proc/sys/vm/drop_caches +To free dentries and inodes: + echo 2 > /proc/sys/vm/drop_caches +To free pagecache, dentries and inodes: + echo 3 > /proc/sys/vm/drop_caches + +As this is a non-destructive operation and dirty objects are not freeable, the +user should run `sync' first. + + 2.5 /proc/sys/dev - Device specific parameters ---------------------------------------------- @@ -1433,10 +1538,10 @@ TCP settings tcp_ecn ------- -This file controls the use of the ECN bit in the IPv4 headers, this is a new +This file controls the use of the ECN bit in the IPv4 headers. This is a new feature about Explicit Congestion Notification, but some routers and firewalls -block trafic that has this bit set, so it could be necessary to echo 0 to -/proc/sys/net/ipv4/tcp_ecn, if you want to talk to this sites. For more info +block traffic that has this bit set, so it could be necessary to echo 0 to +/proc/sys/net/ipv4/tcp_ecn if you want to talk to these sites. For more info you could read RFC2481. tcp_retrans_collapse @@ -1483,7 +1588,7 @@ Enable the strict RFC793 interpretation of the TCP urgent pointer field. The default is to use the BSD compatible interpretation of the urgent pointer pointing to the first byte after the urgent data. The RFC793 interpretation is to have it point to the last byte of urgent data. Enabling this option may -lead to interoperatibility problems. Disabled by default. +lead to interoperability problems. Disabled by default. tcp_syncookies -------------- @@ -1628,7 +1733,7 @@ error_burst and error_cost These parameters are used to limit how many ICMP destination unreachable to send from the host in question. ICMP destination unreachable messages are -sent when we can not reach the next hop, while trying to transmit a packet. +sent when we cannot reach the next hop while trying to transmit a packet. It will also print some error messages to kernel logs if someone is ignoring our ICMP redirects. The higher the error_cost factor is, the fewer destination unreachable and error messages will be let through. Error_burst @@ -1640,11 +1745,13 @@ flush Writing to this file results in a flush of the routing cache. -gc_elastic, gc_interval, gc_min_interval, gc_tresh, gc_timeout --------------------------------------------------------------- +gc_elasticity, gc_interval, gc_min_interval_ms, gc_timeout, gc_thresh +--------------------------------------------------------------------- Values to control the frequency and behavior of the garbage collection -algorithm for the routing cache. +algorithm for the routing cache. gc_min_interval is deprecated and replaced +by gc_min_interval_ms. + max_size -------- @@ -1683,18 +1790,25 @@ settings contain additional options to set garbage collection parameters. In the interface directories you'll find the following entries: -base_reachable_time -------------------- +base_reachable_time, base_reachable_time_ms +------------------------------------------- A base value used for computing the random reachable time value as specified in RFC2461. -retrans_time ------------- +Expression of base_reachable_time, which is deprecated, is in seconds. +Expression of base_reachable_time_ms is in milliseconds. + +retrans_time, retrans_time_ms +----------------------------- + +The time between retransmitted Neighbor Solicitation messages. +Used for address resolution and to determine if a neighbor is +unreachable. -The time, expressed in jiffies (1/100 sec), between retransmitted Neighbor -Solicitation messages. Used for address resolution and to determine if a -neighbor is unreachable. +Expression of retrans_time, which is deprecated, is in 1/100 seconds (for +IPv4) or in jiffies (for IPv6). +Expression of retrans_time_ms is in milliseconds. unres_qlen ---------- @@ -1743,7 +1857,7 @@ proxy_qlen Maximum queue length of the delayed proxy arp timer. (see proxy_delay). -app_solcit +app_solicit ---------- Determines the number of requests to send to the user level ARP daemon. Use 0 @@ -1850,6 +1964,22 @@ a queue must be less or equal then msg_max. maximum message size value (it is every message queue's attribute set during its creation). +2.12 /proc//oom_adj - Adjust the oom-killer score +------------------------------------------------------ + +This file can be used to adjust the score used to select which processes +should be killed in an out-of-memory situation. Giving it a high score will +increase the likelihood of this process being killed by the oom-killer. Valid +values are in the range -16 to +15, plus the special value -17, which disables +oom-killing altogether for this process. + +2.13 /proc//oom_score - Display current oom-killer score +------------------------------------------------------------- + +------------------------------------------------------------------------------ +This file can be used to check the current score used by the oom-killer is for +any given . Use it together with /proc//oom_adj to tune which +process should be killed in an out-of-memory situation. ------------------------------------------------------------------------------ Summary