X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Ffilesystems%2Fvfs.txt;fp=Documentation%2Ffilesystems%2Fvfs.txt;h=e56e842847d3ee3e48c79b4099bc0d6040d730a1;hb=64ba3f394c830ec48a1c31b53dcae312c56f1604;hp=1cb7e8be927ad55acdd739bd31439096d9c55d5d;hpb=be1e6109ac94a859551f8e1774eb9a8469fe055c;p=linux-2.6.git diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 1cb7e8be9..e56e84284 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -113,8 +113,8 @@ members are defined: struct file_system_type { const char *name; int fs_flags; - int (*get_sb) (struct file_system_type *, int, - const char *, void *, struct vfsmount *); + struct super_block *(*get_sb) (struct file_system_type *, int, + const char *, void *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; @@ -211,7 +211,7 @@ struct super_operations { int (*sync_fs)(struct super_block *sb, int wait); void (*write_super_lockfs) (struct super_block *); void (*unlockfs) (struct super_block *); - int (*statfs) (struct dentry *, struct kstatfs *); + int (*statfs) (struct super_block *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); @@ -230,15 +230,10 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by inode_alloc() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which - contains a 'struct inode' embedded within it. + for struct inode and initialize it. destroy_inode: this method is called by destroy_inode() to release - resources allocated for struct inode. It is only required if - ->alloc_inode was defined and simply undoes anything done by - ->alloc_inode. + resources allocated for struct inode. read_inode: this method is called to read a specific inode from the mounted filesystem. The i_ino member in the struct inode is @@ -448,81 +443,14 @@ otherwise noted. The Address Space Object ======================== -The address space object is used to group and manage pages in the page -cache. It can be used to keep track of the pages in a file (or -anything else) and also track the mapping of sections of the file into -process address spaces. - -There are a number of distinct yet related services that an -address-space can provide. These include communicating memory -pressure, page lookup by address, and keeping track of pages tagged as -Dirty or Writeback. - -The first can be used independently to the others. The VM can try to -either write dirty pages in order to clean them, or release clean -pages in order to reuse them. To do this it can call the ->writepage -method on dirty pages, and ->releasepage on clean pages with -PagePrivate set. Clean pages without PagePrivate and with no external -references will be released without notice being given to the -address_space. - -To achieve this functionality, pages need to be placed on an LRU with -lru_cache_add and mark_page_active needs to be called whenever the -page is used. - -Pages are normally kept in a radix tree index by ->index. This tree -maintains information about the PG_Dirty and PG_Writeback status of -each page, so that pages with either of these flags can be found -quickly. - -The Dirty tag is primarily used by mpage_writepages - the default -->writepages method. It uses the tag to find dirty pages to call -->writepage on. If mpage_writepages is not used (i.e. the address -provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is -almost unused. write_inode_now and sync_inode do use it (through -__sync_single_inode) to check if ->writepages has been successful in -writing out the whole address_space. - -The Writeback tag is used by filemap*wait* and sync_page* functions, -via wait_on_page_writeback_range, to wait for all writeback to -complete. While waiting ->sync_page (if defined) will be called on -each page that is found to require writeback. - -An address_space handler may attach extra information to a page, -typically using the 'private' field in the 'struct page'. If such -information is attached, the PG_Private flag should be set. This will -cause various VM routines to make extra calls into the address_space -handler to deal with that data. - -An address space acts as an intermediate between storage and -application. Data is read into the address space a whole page at a -time, and provided to the application either by copying of the page, -or by memory-mapping the page. -Data is written into the address space by the application, and then -written-back to storage typically in whole pages, however the -address_space has finer control of write sizes. - -The read process essentially only requires 'readpage'. The write -process is more complicated and uses prepare_write/commit_write or -set_page_dirty to write data into the address_space, and writepage, -sync_page, and writepages to writeback data to storage. - -Adding and removing pages to/from an address_space is protected by the -inode's i_mutex. - -When data is written to a page, the PG_Dirty flag should be set. It -typically remains set until writepage asks for it to be written. This -should clear PG_Dirty and set PG_Writeback. It can be actually -written at any point after PG_Dirty is clear. Once it is known to be -safe, PG_Writeback is cleared. - -Writeback makes use of a writeback_control structure... +The address space object is used to identify pages in the page cache. + struct address_space_operations ------------------------------- This describes how the VFS can manipulate mapping of a file to page cache in -your filesystem. As of kernel 2.6.16, the following members are defined: +your filesystem. As of kernel 2.6.13, the following members are defined: struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); @@ -541,148 +469,47 @@ struct address_space_operations { loff_t offset, unsigned long nr_segs); struct page* (*get_xip_page)(struct address_space *, sector_t, int); - /* migrate the contents of a page to the specified target */ - int (*migratepage) (struct page *, struct page *); }; - writepage: called by the VM to write a dirty page to backing store. - This may happen for data integrity reasons (i.e. 'sync'), or - to free up memory (flush). The difference can be seen in - wbc->sync_mode. - The PG_Dirty flag has been cleared and PageLocked is true. - writepage should start writeout, should set PG_Writeback, - and should make sure the page is unlocked, either synchronously - or asynchronously when the write operation completes. - - If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to - try too hard if there are problems, and may choose to write out - other pages from the mapping if that is easier (e.g. due to - internal dependencies). If it chooses not to start writeout, it - should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep - calling ->writepage on that page. - - See the file "Locking" for more details. + writepage: called by the VM write a dirty page to backing store. readpage: called by the VM to read a page from backing store. - The page will be Locked when readpage is called, and should be - unlocked and marked uptodate once the read completes. - If ->readpage discovers that it needs to unlock the page for - some reason, it can do so, and then return AOP_TRUNCATED_PAGE. - In this case, the page will be relocated, relocked and if - that all succeeds, ->readpage will be called again. sync_page: called by the VM to notify the backing store to perform all queued I/O operations for a page. I/O operations for other pages associated with this address_space object may also be performed. - This function is optional and is called only for pages with - PG_Writeback set while waiting for the writeback to complete. - writepages: called by the VM to write out pages associated with the - address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then - the writeback_control will specify a range of pages that must be - written out. If it is WBC_SYNC_NONE, then a nr_to_write is given - and that many pages should be written if possible. - If no ->writepages is given, then mpage_writepages is used - instead. This will choose pages from the address space that are - tagged as DIRTY and will pass them to ->writepage. + address_space object. set_page_dirty: called by the VM to set a page dirty. - This is particularly needed if an address space attaches - private data to a page, and that data needs to be updated when - a page is dirtied. This is called, for example, when a memory - mapped page gets modified. - If defined, it should set the PageDirty flag, and the - PAGECACHE_TAG_DIRTY tag in the radix tree. readpages: called by the VM to read pages associated with the address_space - object. This is essentially just a vector version of - readpage. Instead of just one page, several pages are - requested. - readpages is only used for read-ahead, so read errors are - ignored. If anything goes wrong, feel free to give up. + object. prepare_write: called by the generic write path in VM to set up a write - request for a page. This indicates to the address space that - the given range of bytes is about to be written. The - address_space should check that the write will be able to - complete, by allocating space if necessary and doing any other - internal housekeeping. If the write will update parts of - any basic-blocks on storage, then those blocks should be - pre-read (if they haven't been read already) so that the - updated blocks can be written out properly. - The page will be locked. If prepare_write wants to unlock the - page it, like readpage, may do so and return - AOP_TRUNCATED_PAGE. - In this case the prepare_write will be retried one the lock is - regained. - - commit_write: If prepare_write succeeds, new data will be copied - into the page and then commit_write will be called. It will - typically update the size of the file (if appropriate) and - mark the inode as dirty, and do any other related housekeeping - operations. It should avoid returning an error if possible - - errors should have been handled by prepare_write. + request for a page. + + commit_write: called by the generic write path in VM to write page to + its backing store. bmap: called by the VFS to map a logical block offset within object to - physical block number. This method is used by the FIBMAP - ioctl and for working with swap-files. To be able to swap to - a file, the file must have a stable mapping to a block - device. The swap system does not go through the filesystem - but instead uses bmap to find out where the blocks in the file - are and uses those addresses directly. - - - invalidatepage: If a page has PagePrivate set, then invalidatepage - will be called when part or all of the page is to be removed - from the address space. This generally corresponds to either a - truncation or a complete invalidation of the address space - (in the latter case 'offset' will always be 0). - Any private data associated with the page should be updated - to reflect this truncation. If offset is 0, then - the private data should be released, because the page - must be able to be completely discarded. This may be done by - calling the ->releasepage function, but in this case the - release MUST succeed. - - releasepage: releasepage is called on PagePrivate pages to indicate - that the page should be freed if possible. ->releasepage - should remove any private data from the page and clear the - PagePrivate flag. It may also remove the page from the - address_space. If this fails for some reason, it may indicate - failure with a 0 return value. - This is used in two distinct though related cases. The first - is when the VM finds a clean page with no active users and - wants to make it a free page. If ->releasepage succeeds, the - page will be removed from the address_space and become free. - - The second case if when a request has been made to invalidate - some or all pages in an address_space. This can happen - through the fadvice(POSIX_FADV_DONTNEED) system call or by the - filesystem explicitly requesting it as nfs and 9fs do (when - they believe the cache may be out of date with storage) by - calling invalidate_inode_pages2(). - If the filesystem makes such a call, and needs to be certain - that all pages are invalidated, then its releasepage will - need to ensure this. Possibly it can clear the PageUptodate - bit if it cannot free private data yet. - - direct_IO: called by the generic read/write routines to perform - direct_IO - that is IO requests which bypass the page cache - and transfer data directly between the storage and the - application's address space. + physical block number. This method is use by for the legacy FIBMAP + ioctl. Other uses are discouraged. + + invalidatepage: called by the VM on truncate to disassociate a page from its + address_space mapping. + + releasepage: called by the VFS to release filesystem specific metadata from + a page. + + direct_IO: called by the VM for direct I/O writes and reads. get_xip_page: called by the VM to translate a block number to a page. The page is valid until the corresponding filesystem is unmounted. Filesystems that want to use execute-in-place (XIP) need to implement it. An example implementation can be found in fs/ext2/xip.c. - migrate_page: This is used to compact the physical memory usage. - If the VM wants to relocate a page (maybe off a memory card - that is signalling imminent failure) it will pass a new page - and an old page to this function. migrate_page should - transfer any private data across and update any references - that it has to the page. The File Object =============== @@ -694,7 +521,7 @@ struct file_operations ---------------------- This describes how the VFS can manipulate an open file. As of kernel -2.6.17, the following members are defined: +2.6.13, the following members are defined: struct file_operations { loff_t (*llseek) (struct file *, loff_t, int); @@ -723,10 +550,6 @@ struct file_operations { int (*check_flags)(int); int (*dir_notify)(struct file *filp, unsigned long arg); int (*flock) (struct file *, int, struct file_lock *); - ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, size_t, unsigned -int); - ssize_t (*splice_read)(struct file *, struct pipe_inode_info *, size_t, unsigned -int); }; Again, all methods are called without any locks being held, unless @@ -794,12 +617,6 @@ otherwise noted. flock: called by the flock(2) system call - splice_write: called by the VFS to splice data from a pipe to a file. This - method is used by the splice(2) system call - - splice_read: called by the VFS to splice data from file to a pipe. This - method is used by the splice(2) system call - Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special