X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Ffilesystems%2Frelayfs.txt;h=5832377b7340ed4b811b1a5ad12cad4147a2c9bf;hb=43bc926fffd92024b46cafaf7350d669ba9ca884;hp=7397bdb2358932e16c147f8572bf981691482e08;hpb=cee37fe97739d85991964371c1f3a745c00dd236;p=linux-2.6.git diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt index 7397bdb23..5832377b7 100644 --- a/Documentation/filesystems/relayfs.txt +++ b/Documentation/filesystems/relayfs.txt @@ -3,797 +3,427 @@ relayfs - a high-speed data relay filesystem ============================================ relayfs is a filesystem designed to provide an efficient mechanism for -tools and facilities to relay large amounts of data from kernel space -to user space. - -The main idea behind relayfs is that every data flow is put into a -separate "channel" and each channel is a file. In practice, each -channel is a separate memory buffer allocated from within kernel space -upon channel instantiation. Software needing to relay data to user -space would open a channel or a number of channels, depending on its -needs, and would log data to that channel. All the buffering and -locking mechanics are taken care of by relayfs. The actual format and -protocol used for each channel is up to relayfs' clients. - -relayfs makes no provisions for copying the same data to more than a -single channel. This is for the clients of the relay to take care of, -and so is any form of data filtering. The purpose is to keep relayfs -as simple as possible. - - -Usage -===== - -In addition to the relayfs kernel API described below, relayfs -implements basic file operations. Here are the file operations that -are available and some comments regarding their behavior: - -open() enables user to open an _existing_ channel. A channel can be - opened in blocking or non-blocking mode, and can be opened - for reading as well as for writing. Readers will by default - be auto-consuming. - -mmap() results in channel's memory buffer being mmapped into the - caller's memory space. - -read() since we are dealing with circular buffers, the user is only - allowed to read forward. Some apps may want to loop around - read() waiting for incoming data - if there is no data - available, read will put the reader on a wait queue until - data is available (blocking mode). Non-blocking reads return - -EAGAIN if data is not available. - - -write() writing from user space operates exactly as relay_write() does - (described below). - -poll() POLLIN/POLLRDNORM/POLLOUT/POLLWRNORM/POLLERR supported. - -close() decrements the channel's refcount. When the refcount reaches - 0 i.e. when no process or kernel client has the file open - (see relay_close() below), the channel buffer is freed. +tools and facilities to relay large and potentially sustained streams +of data from kernel space to user space. + +The main abstraction of relayfs is the 'channel'. A channel consists +of a set of per-cpu kernel buffers each represented by a file in the +relayfs filesystem. Kernel clients write into a channel using +efficient write functions which automatically log to the current cpu's +channel buffer. User space applications mmap() the per-cpu files and +retrieve the data as it becomes available. + +The format of the data logged into the channel buffers is completely +up to the relayfs client; relayfs does however provide hooks which +allow clients to impose some structure on the buffer data. Nor does +relayfs implement any form of data filtering - this also is left to +the client. The purpose is to keep relayfs as simple as possible. + +This document provides an overview of the relayfs API. The details of +the function parameters are documented along with the functions in the +filesystem code - please see that for details. + +Semantics +========= + +Each relayfs channel has one buffer per CPU, each buffer has one or +more sub-buffers. Messages are written to the first sub-buffer until +it is too full to contain a new message, in which case it it is +written to the next (if available). Messages are never split across +sub-buffers. At this point, userspace can be notified so it empties +the first sub-buffer, while the kernel continues writing to the next. + +When notified that a sub-buffer is full, the kernel knows how many +bytes of it are padding i.e. unused. Userspace can use this knowledge +to copy only valid data. + +After copying it, userspace can notify the kernel that a sub-buffer +has been consumed. + +relayfs can operate in a mode where it will overwrite data not yet +collected by userspace, and not wait for it to consume it. + +relayfs itself does not provide for communication of such data between +userspace and kernel, allowing the kernel side to remain simple and +not impose a single interface on userspace. It does provide a set of +examples and a separate helper though, described below. + +klog and relay-apps example code +================================ + +relayfs itself is ready to use, but to make things easier, a couple +simple utility functions and a set of examples are provided. + +The relay-apps example tarball, available on the relayfs sourceforge +site, contains a set of self-contained examples, each consisting of a +pair of .c files containing boilerplate code for each of the user and +kernel sides of a relayfs application; combined these two sets of +boilerplate code provide glue to easily stream data to disk, without +having to bother with mundane housekeeping chores. + +The 'klog debugging functions' patch (klog.patch in the relay-apps +tarball) provides a couple of high-level logging functions to the +kernel which allow writing formatted text or raw data to a channel, +regardless of whether a channel to write into exists or not, or +whether relayfs is compiled into the kernel or is configured as a +module. These functions allow you to put unconditional 'trace' +statements anywhere in the kernel or kernel modules; only when there +is a 'klog handler' registered will data actually be logged (see the +klog and kleak examples for details). + +It is of course possible to use relayfs from scratch i.e. without +using any of the relay-apps example code or klog, but you'll have to +implement communication between userspace and kernel, allowing both to +convey the state of buffers (full, empty, amount of padding). + +klog and the relay-apps examples can be found in the relay-apps +tarball on http://relayfs.sourceforge.net + + +The relayfs user space API +========================== + +relayfs implements basic file operations for user space access to +relayfs channel buffer data. Here are the file operations that are +available and some comments regarding their behavior: + +open() enables user to open an _existing_ buffer. + +mmap() results in channel buffer being mapped into the caller's + memory space. Note that you can't do a partial mmap - you must + map the entire file, which is NRBUF * SUBBUFSIZE. + +read() read the contents of a channel buffer. The bytes read are + 'consumed' by the reader i.e. they won't be available again + to subsequent reads. If the channel is being used in + no-overwrite mode (the default), it can be read at any time + even if there's an active kernel writer. If the channel is + being used in overwrite mode and there are active channel + writers, results may be unpredictable - users should make + sure that all logging to the channel has ended before using + read() with overwrite mode. + +poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are + notified when sub-buffer boundaries are crossed. + +close() decrements the channel buffer's refcount. When the refcount + reaches 0 i.e. when no process or kernel client has the buffer + open, the channel buffer is freed. In order for a user application to make use of relayfs files, the relayfs filesystem must be mounted. For example, - mount -t relayfs relayfs /mountpoint + mount -t relayfs relayfs /mnt/relay + +NOTE: relayfs doesn't need to be mounted for kernel clients to create + or use channels - it only needs to be mounted when user space + applications need access to the buffer data. The relayfs kernel API ====================== -relayfs channels are implemented as circular buffers subdivided into -'sub-buffers'. kernel clients write data into the channel using -relay_write(), and are notified via a set of callbacks when -significant events occur within the channel. 'Significant events' -include: - -- a sub-buffer has been filled i.e. the current write won't fit into the - current sub-buffer, and a 'buffer-switch' is triggered, after which - the data is written into the next buffer (if the next buffer is - empty). The client is notified of this condition via two callbacks, - one providing an opportunity to perform start-of-buffer tasks, the - other end-of-buffer tasks. - -- data is ready for the client to process. The client can choose to - be notified either on a per-sub-buffer basis (bulk delivery) or - per-write basis (packet delivery). - -- data has been written to the channel from user space. The client can - use this notification to accept and process 'commands' sent to the - channel via write(2). - -- the channel has been opened/closed/mapped/unmapped from user space. - The client can use this notification to trigger actions within the - kernel application, such as enabling/disabling logging to the - channel. It can also return result codes from the callback, - indicating that the operation should fail e.g. in order to restrict - more than one user space open or mmap. - -- the channel needs resizing, or needs to update its - state based on the results of the resize. Resizing the channel is - up to the kernel client to actually perform. If the channel is - configured for resizing, the client is notified when the unread data - in the channel passes a preset threshold, giving it the opportunity - to allocate a new channel buffer and replace the old one. - -Reader objects --------------- - -Channel readers use an opaque rchan_reader object to read from -channels. For VFS readers (those using read(2) to read from a -channel), these objects are automatically created and used internally; -only kernel clients that need to directly read from channels, or whose -userspace applications use mmap to access channel data, need to know -anything about rchan_readers - others may skip this section. - -A relay channel can have any number of readers, each represented by an -rchan_reader instance, which is used to encapsulate reader settings -and state. rchan_reader objects should be treated as opaque by kernel -clients. To create a reader object for directly accessing a channel -from kernel space, call the add_rchan_reader() kernel API function: - -rchan_reader *add_rchan_reader(rchan_id, auto_consume) - -This function returns an rchan_reader instance if successful, which -should then be passed to relay_read() when the kernel client is -interested in reading from the channel. - -The auto_consume parameter indicates whether a read done by this -reader will automatically 'consume' that portion of the unread channel -buffer when relay_read() is called (see below for more details). - -To close the reader, call - -remove_rchan_reader(reader) - -which will remove the reader from the list of current readers. - - -To create a reader object representing a userspace mmap reader in the -kernel application, call the add_map_reader() kernel API function: - -rchan_reader *add_map_reader(rchan_id) - -This function returns an rchan_reader instance if successful, whose -main purpose is as an argument to be passed into -relay_buffers_consumed() when the kernel client becomes aware that -data has been read by a user application using mmap to read from the -channel buffer. There is no auto_consume option in this case, since -only the kernel client/user application knows when data has been read. - -To close the map reader, call - -remove_map_reader(reader) - -which will remove the reader from the list of current readers. - -Consumed count --------------- - -A relayfs channel is a circular buffer, which means that if there is -no reader reading from it or a reader reading too slowly, at some -point the channel writer will 'lap' the reader and data will be lost. -In normal use, readers will always be able to keep up with writers and -the buffer is thus never in danger of becoming full. In many -applications, it's sufficient to ensure that this is practically -speaking always the case, by making the buffers large enough. These -types of applications can basically open the channel as -RELAY_MODE_CONTINOUS (the default anyway) and not worry about the -meaning of 'consume' and skip the rest of this section. - -If it's important for the application that a kernel client never allow -writers to overwrite unread data, the channel should be opened using -RELAY_MODE_NO_OVERWRITE and must be kept apprised of the count of -bytes actually read by the (typically) user-space channel readers. -This count is referred to as the 'consumed count'. read(2) channel -readers automatically update the channel's 'consumed count' as they -read. If the usage mode is to have only read(2) readers, which is -typically the case, the kernel client doesn't need to worry about any -of the relayfs functions having to do with 'bytes consumed' and can -skip the rest of this section. (Note that it is possible to have -multiple read(2) or auto-consuming readers, but like having multiple -readers on a pipe, these readers will race with each other i.e. it's -supported, but doesn't make much sense). - -If the kernel client cannot rely on an auto-consuming reader to keep -the 'consumed count' up-to-date, then it must do so manually, by -making the appropriate calls to relay_buffers_consumed() or -relay_bytes_consumed(). In most cases, this should only be necessary -for bulk mmap clients - almost all packet clients should be covered by -having auto-consuming read(2) readers. For mmapped bulk clients, for -instance, there are no auto-consuming VFS readers, so the kernel -client needs to make the call to relay_buffers_consumed() after -sub-buffers are read. - -Kernel API ----------- - Here's a summary of the API relayfs provides to in-kernel clients: -int relay_open(channel_path, bufsize, nbufs, channel_flags, - channel_callbacks, start_reserve, end_reserve, - rchan_start_reserve, resize_min, resize_max, mode, - init_buf, init_buf_size) -int relay_write(channel_id, *data_ptr, count, time_delta_offset, **wrote) -rchan_reader *add_rchan_reader(channel_id, auto_consume) -int remove_rchan_reader(rchan_reader *reader) -rchan_reader *add_map_reader(channel_id) -int remove_map_reader(rchan_reader *reader) -int relay_read(reader, buf, count, wait, *actual_read_offset) -void relay_buffers_consumed(reader, buffers_consumed) -void relay_bytes_consumed(reader, bytes_consumed, read_offset) -int relay_bytes_avail(reader) -int rchan_full(reader) -int rchan_empty(reader) -int relay_info(channel_id, *channel_info) -int relay_close(channel_id) -int relay_realloc_buffer(channel_id, nbufs, async) -int relay_replace_buffer(channel_id) -int relay_reset(int rchan_id) - ----------- -int relay_open(channel_path, bufsize, nbufs, - channel_flags, channel_callbacks, start_reserve, - end_reserve, rchan_start_reserve, resize_min, resize_max, mode) - -relay_open() is used to create a new entry in relayfs. This new entry -is created according to channel_path. channel_path contains the -absolute path to the channel file on relayfs. If, for example, the -caller sets channel_path to "/xlog/9", a "xlog/9" entry will appear -within relayfs automatically and the "xlog" directory will be created -in the filesystem's root. relayfs does not implement any policy on -its content, except to disallow the opening of two channels using the -same file. There are, nevertheless a set of guidelines for using -relayfs. Basically, each facility using relayfs should use a top-level -directory identifying it. The entry created above, for example, -presumably belongs to the "xlog" software. - -The remaining parameters for relay_open() are as follows: - -- channel_flags - an ORed combination of attribute values controlling - common channel characteristics: - - - logging scheme - relayfs use 2 mutually exclusive schemes - for logging data to a channel. The 'lockless scheme' - reserves and writes data to a channel without the need of - any type of locking on the channel. This is the preferred - scheme, but may not be available on a given architecture (it - relies on the presence of a cmpxchg instruction). It's - specified by the RELAY_SCHEME_LOCKLESS flag. The 'locking - scheme' either obtains a lock on the channel for writing or - disables interrupts, depending on whether the channel was - opened for SMP or global usage (see below). It's specified - by the RELAY_SCHEME_LOCKING flag. While a client may want - to explicitly specify a particular scheme to use, it's more - convenient to specify RELAY_SCHEME_ANY for this flag, which - will allow relayfs to choose the best available scheme i.e. - lockless if supported. - - - overwrite mode (default is RELAY_MODE_CONTINUOUS) - - If RELAY_MODE_CONTINUOUS is specified, writes to the channel - will succeed regardless of whether there are up-to-date - consumers or not. If RELAY_MODE_NO_OVERWRITE is specified, - the channel becomes 'full' when the total amount of buffer - space unconsumed by readers equals or exceeds the total - buffer size. With the buffer in this state, writes to the - buffer will fail - clients need to check the return code from - relay_write() to determine if this is the case and act - accordingly - 0 or a negative value indicate the write failed. - - - SMP usage - this applies only when the locking scheme is in - use. If RELAY_USAGE_SMP is specified, it's assumed that the - channel will be used in a per-CPU fashion and consequently, - the only locking that will be done for writes is to disable - local irqs. If RELAY_USAGE_GLOBAL is specified, it's assumed - that writes to the buffer can occur within any CPU context, - and spinlock_irq_save will be used to lock the buffer. - - - delivery mode - if RELAY_DELIVERY_BULK is specified, the - client will be notified via its deliver() callback whenever a - sub-buffer has been filled. Alternatively, - RELAY_DELIVERY_PACKET will cause delivery to occur after the - completion of each write. See the description of the channel - callbacks below for more details. - - - timestamping - if RELAY_TIMESTAMP_TSC is specified and the - architecture supports it, efficient TSC 'timestamps' can be - associated with each write, otherwise more expensive - gettimeofday() timestamping is used. At the beginning of - each sub-buffer, a gettimeofday() timestamp and the current - TSC, if supported, are read, and are passed on to the client - via the buffer_start() callback. This allows correlation of - the current time with the current TSC for subsequent writes. - Each subsequent write is associated with a 'time delta', - which is either the current TSC, if the channel is using - TSCs, or the difference between the buffer_start gettimeofday - timestamp and the gettimeofday time read for the current - write. Note that relayfs never writes either a timestamp or - time delta into the buffer unless explicitly asked to (see - the description of relay_write() for details). - -- bufsize - the size of the 'sub-buffers' making up the circular channel - buffer. For the lockless scheme, this must be a power of 2. - -- nbufs - the number of 'sub-buffers' making up the circular - channel buffer. This must be a power of 2. - - The total size of the channel buffer is bufsize * nbufs rounded up - to the next kernel page size. If the lockless scheme is used, both - bufsize and nbufs must be a power of 2. If the locking scheme is - used, the bufsize can be anything and nbufs must be a power of 2. If - RELAY_SCHEME_ANY is used, the bufsize and nbufs should be a power of 2. - - NOTE: if nbufs is 1, relayfs will bypass the normal size - checks and will allocate an rvmalloced buffer of size bufsize. - This buffer will be freed when relay_close() is called, if the channel - isn't still being referenced. - -- callbacks - a table of callback functions called when events occur - within the data relay that clients need to know about: - - - int buffer_start(channel_id, current_write_pos, buffer_id, - start_time, start_tsc, using_tsc) - - - called at the beginning of a new sub-buffer, the - buffer_start() callback gives the client an opportunity to - write data into space reserved at the beginning of a - sub-buffer. The client should only write into the buffer - if it specified a value for start_reserve and/or - channel_start_reserve (see below) when the channel was - opened. In the latter case, the client can determine - whether to write its one-time rchan_start_reserve data by - examining the value of buffer_id, which will be 0 for the - first sub-buffer. The address that the client can write - to is contained in current_write_pos (the client by - definition knows how much it can write i.e. the value it - passed to relay_open() for start_reserve/ - channel_start_reserve). start_time contains the - gettimeofday() value for the start of the buffer and start - TSC contains the TSC read at the same time. The using_tsc - param indicates whether or not start_tsc is valid (it - wouldn't be if TSC timestamping isn't being used). - - The client should return the number of bytes it wrote to - the channel, 0 if none. - - - int buffer_end(channel_id, current_write_pos, end_of_buffer, - end_time, end_tsc, using_tsc) - - called at the end of a sub-buffer, the buffer_end() - callback gives the client an opportunity to perform - end-of-buffer processing. Note that the current_write_pos - is the position where the next write would occur, but - since the current write wouldn't fit (which is the trigger - for the buffer_end event), the buffer is considered full - even though there may be unused space at the end. The - end_of_buffer param pointer value can be used to determine - exactly the size of the unused space. The client should - only write into the buffer if it specified a value for - end_reserve when the channel was opened. If the client - doesn't write anything i.e. returns 0, the unused space at - the end of the sub-buffer is available via relay_info() - - this data may be needed by the client later if it needs to - process raw sub-buffers (an alternative would be to save - the unused bytes count value in end_reserve space at the - end of each sub-buffer during buffer_end processing and - read it when needed at a later time. The other - alternative would be to use read(2), which makes the - unused count invisible to the caller). end_time contains - the gettimeofday() value for the end of the buffer and end - TSC contains the TSC read at the same time. The using_tsc - param indicates whether or not end_tsc is valid (it - wouldn't be if TSC timestamping isn't being used). - - The client should return the number of bytes it wrote to - the channel, 0 if none. - - - void deliver(channel_id, from, len) - - called when data is ready for the client. This callback - is used to notify a client when a sub-buffer is complete - (in the case of bulk delivery) or a single write is - complete (packet delivery). A bulk delivery client might - wish to then signal a daemon that a sub-buffer is ready. - A packet delivery client might wish to process the packet - or send it elsewhere. The from param is a pointer to the - delivered data and len specifies how many bytes are ready. - - - void user_deliver(channel_id, from, len) - - called when data has been written to the channel from user - space. This callback is used to notify a client when a - successful write from userspace has occurred, independent - of whether bulk or packet delivery is in use. This can be - used to allow userspace programs to communicate with the - kernel client through the channel via out-of-band write(2) - 'commands' instead of via ioctls, for instance. The from - param is a pointer to the delivered data and len specifies - how many bytes are ready. Note that this callback occurs - after the bytes have been successfully written into the - channel, which means that channel readers must be able to - deal with the 'command' data which will appear in the - channel data stream just as any other userspace or - non-userspace write would. - - - int needs_resize(channel_id, resize_type, - suggested_buf_size, suggested_n_bufs) - - called when a channel's buffers are in danger of becoming - full i.e. the number of unread bytes in the channel passes - a preset threshold, or when the current capacity of a - channel's buffer is no longer needed. Also called to - notify the client when a channel's buffer has been - replaced. If resize_type is RELAY_RESIZE_EXPAND or - RELAY_RESIZE_SHRINK, the kernel client should arrange to - call relay_realloc_buffer() with the suggested buffer size - and buffer count, which will allocate (but will not - replace the old one) a new buffer of the recommended size - for the channel. When the allocation has completed, - needs_resize() is again called, this time with a - resize_type of RELAY_RESIZE_REPLACE. The kernel client - should then arrange to call relay_replace_buffer() to - actually replace the old channel buffer with the newly - allocated buffer. Finally, once the buffer replacement - has completed, needs_resize() is again called, this time - with a resize_type of RELAY_RESIZE_REPLACED, to inform the - client that the replacement is complete and additionally - confirming the current sub-buffer size and number of - sub-buffers. Note that a resize can be canceled if - relay_realloc_buffer() is called with the async param - non-zero and the resize conditions no longer hold. In - this case, the RELAY_RESIZE_REPLACED suggested number of - sub-buffers will be the same as the number of sub-buffers - that existed before the RELAY_RESIZE_SHRINK or EXPAND i.e. - values indicating that the resize didn't actually occur. - - - int fileop_notify(channel_id, struct file *filp, enum relay_fileop) - - called when a userspace file operation has occurred or - will occur on a relayfs channel file. These notifications - can be used by the kernel client to trigger actions within - the kernel client when the corresponding event occurs, - such as enabling logging only when a userspace application - opens or mmaps a relayfs file and disabling it again when - the file is closed or unmapped. The kernel client can - also return its own return value, which can affect the - outcome of file operation - returning 0 indicates that the - operation should succeed, and returning a negative value - indicates that the operation should be failed, and that - the returned value should be returned to the ultimate - caller e.g. returning -EPERM from the open fileop will - cause the open to fail with -EPERM. Among other things, - the return value can be used to restrict a relayfs file - from being opened or mmap'ed more than once. The currently - implemented fileops are: - - RELAY_FILE_OPEN - a relayfs file is being opened. Return - 0 to allow it to succeed, negative to - have it fail. A negative return value will - be passed on unmodified to the open fileop. - RELAY_FILE_CLOSE- a relayfs file is being closed. The return - value is ignored. - RELAY_FILE_MAP - a relayfs file is being mmap'ed. Return 0 - to allow it to succeed, negative to have - it fail. A negative return value will be - passed on unmodified to the mmap fileop. - RELAY_FILE_UNMAP- a relayfs file is being unmapped. The return - value is ignored. - - - void ioctl(rchan_id, cmd, arg) - - called when an ioctl call is made using a relayfs file - descriptor. The cmd and arg are passed along to this - callback unmodified for it to do as it wishes with. The - return value from this callback is used as the return value - of the ioctl call. - - If the callbacks param passed to relay_open() is NULL, a set of - default do-nothing callbacks will be defined for the channel. - Likewise, any NULL rchan_callback function contained in a non-NULL - callbacks struct will be filled in with a default callback function - that does nothing. - -- start_reserve - the number of bytes to be reserved at the start of - each sub-buffer. The client can do what it wants with this number - of bytes when the buffer_start() callback is invoked. Typically - clients would use this to write per-sub-buffer header data. - -- end_reserve - the number of bytes to be reserved at the end of each - sub-buffer. The client can do what it wants with this number of - bytes when the buffer_end() callback is invoked. Typically clients - would use this to write per-sub-buffer footer data. - -- channel_start_reserve - the number of bytes to be reserved, in - addition to start_reserve, at the beginning of the first sub-buffer - in the channel. The client can do what it wants with this number of - bytes when the buffer_start() callback is invoked. Typically - clients would use this to write per-channel header data. - -- resize_min - if set, this signifies that the channel is - auto-resizeable. The value specifies the size that the channel will - try to maintain as a normal working size, and that it won't go - below. The client makes use of the resizing callbacks and - relay_realloc_buffer() and relay_replace_buffer() to actually effect - the resize. - -- resize_max - if set, this signifies that the channel is - auto-resizeable. The value specifies the maximum size the channel - can have as a result of resizing. - -- mode - if non-zero, specifies the file permissions that will be given - to the channel file. If 0, the default rw user perms will be used. - -- init_buf - if non-NULL, rather than allocating the channel buffer, - this buffer will be used as the initial channel buffer. The kernel - API function relay_discard_init_buf() can later be used to have - relayfs allocate a normal mmappable channel buffer and switch over - to using it after copying the init_buf contents into it. Currently, - the size of init_buf must be exactly buf_size * n_bufs. The caller - is responsible for managing the init_buf memory. This feature is - typically used for init-time channel use and should normally be - specified as NULL. - -- init_buf_size - the total size of init_buf, if init_buf is specified - as non-NULL. Currently, the size of init_buf must be exactly - buf_size * n_bufs. - -Upon successful completion, relay_open() returns a channel id -to be used for all other operations with the relay. All buffers -managed by the relay are allocated using rvmalloc/rvfree to allow -for easy mmapping to user-space. - ----------- -int relay_write(channel_id, *data_ptr, count, time_delta_offset, **wrote_pos) - -relay_write() reserves space in the channel and writes count bytes of -data pointed to by data_ptr to it. Automatically performs any -necessary locking, depending on the scheme and SMP usage in effect (no -locking is done for the lockless scheme regardless of usage). It -returns the number of bytes written, or 0/negative on failure. If -time_delta_offset is >= 0, the internal time delta, the internal time -delta calculated when the slot was reserved will be written at that -offset. This is the TSC or gettimeofday() delta between the current -write and the beginning of the buffer, whichever method is being used -by the channel. Trying to write a count larger than the bufsize -specified to relay_open() (taking into account the reserved -start-of-buffer and end-of-buffer space as well) will fail. If -wrote_pos is non-NULL, it will receive the location the data was -written to, which may be needed for some applications but is not -normally interesting. Most applications should pass in NULL for this -param. - ----------- -struct rchan_reader *add_rchan_reader(int rchan_id, int auto_consume) - -add_rchan_reader creates and initializes a reader object for a -channel. An opaque rchan_reader object is returned on success, and is -passed to relay_read() when reading the channel. If the boolean -auto_consume parameter is 1, the reader is defined to be -auto-consuming. auto-consuming reader objects are automatically -created and used for VFS read(2) readers. - ----------- -void remove_rchan_reader(struct rchan_reader *reader) - -remove_rchan_reader finds and removes the given reader from the -channel. This function is used only by non-VFS read(2) readers. VFS -read(2) readers are automatically removed when the corresponding file -object is closed. - ----------- -reader add_map_reader(int rchan_id) - -Creates and initializes an rchan_reader object for channel map -readers, and is needed for updating relay_bytes/buffers_consumed() -when kernel clients become aware of the need to do so by their mmap -user clients. - ----------- -int remove_map_reader(reader) - -Finds and removes the given map reader from the channel. This function -is useful only for map readers. - ----------- -int relay_read(reader, buf, count, wait, *actual_read_offset) - -Reads count bytes from the channel, or as much as is available within -the sub-buffer currently being read. The read offset that will be -read from is the position contained within the reader object. If the -wait flag is set, buf is non-NULL, and there is nothing available, it -will wait until there is. If the wait flag is 0 and there is nothing -available, -EAGAIN is returned. If buf is NULL, the value returned is -the number of bytes that would have been read. actual_read_offset is -the value that should be passed as the read offset to -relay_bytes_consumed, needed only if the reader is not auto-consuming -and the channel is MODE_NO_OVERWRITE, but in any case, it must not be -NULL. - ----------- - -int relay_bytes_avail(reader) - -Returns the number of bytes available relative to the reader's current -read position within the corresponding sub-buffer, 0 if there is -nothing available. Note that this doesn't return the total bytes -available in the channel buffer - this is enough though to know if -anything is available, however, or how many bytes might be returned -from the next read. - ----------- -void relay_buffers_consumed(reader, buffers_consumed) - -Adds to the channel's consumed buffer count. buffers_consumed should -be the number of buffers newly consumed, not the total number -consumed. NOTE: kernel clients don't need to call this function if -the reader is auto-consuming or the channel is MODE_CONTINUOUS. - -In order for the relay to detect the 'buffers full' condition for a -channel, it must be kept up-to-date with respect to the number of -buffers consumed by the client. If the addition of the value of the -bufs_consumed param to the current bufs_consumed count for the channel -would exceed the bufs_produced count for the channel, the channel's -bufs_consumed count will be set to the bufs_produced count for the -channel. This allows clients to 'catch up' if necessary. - ----------- -void relay_bytes_consumed(reader, bytes_consumed, read_offset) - -Adds to the channel's consumed count. bytes_consumed should be the -number of bytes actually read e.g. return value of relay_read() and -the read_offset should be the actual offset the bytes were read from -e.g. the actual_read_offset set by relay_read(). NOTE: kernel clients -don't need to call this function if the reader is auto-consuming or -the channel is MODE_CONTINUOUS. - -In order for the relay to detect the 'buffers full' condition for a -channel, it must be kept up-to-date with respect to the number of -bytes consumed by the client. For packet clients, it makes more sense -to update after each read rather than after each complete sub-buffer -read. The bytes_consumed count updates bufs_consumed when a buffer -has been consumed so this count remains consistent. - ----------- -int relay_info(channel_id, *channel_info) - -relay_info() fills in an rchan_info struct with channel status and -attribute information such as usage modes, sub-buffer size and count, -the allocated size of the entire buffer, buffers produced and -consumed, current buffer id, count of writes lost due to buffers full -condition. - -The virtual address of the channel buffer is also available here, for -those clients that need it. - -Clients may need to know how many 'unused' bytes there are at the end -of a given sub-buffer. This would only be the case if the client 1) -didn't either write this count to the end of the sub-buffer or -otherwise note it (it's available as the difference between the buffer -end and current write pos params in the buffer_end callback) (if the -client returned 0 from the buffer_end callback, it's assumed that this -is indeed the case) 2) isn't using the read() system call to read the -buffer. In other words, if the client isn't annotating the stream and -is reading the buffer by mmaping it, this information would be needed -in order for the client to 'skip over' the unused bytes at the ends of -sub-buffers. - -Additionally, for the lockless scheme, clients may need to know -whether a particular sub-buffer is actually complete. An array of -boolean values, one per sub-buffer, contains non-zero if the buffer is -complete, non-zero otherwise. - ----------- -int relay_close(channel_id) - -relay_close() is used to close the channel. It finalizes the last -sub-buffer (the one currently being written to) and marks the channel -as finalized. The channel buffer and channel data structure are then -freed automatically when the last reference to the channel is given -up. - ----------- -int relay_realloc_buffer(channel_id, nbufs, async) - -Allocates a new channel buffer using the specified sub-buffer count -(note that resizing can't change sub-buffer sizes). If async is -non-zero, the allocation is done in the background using a work queue. -When the allocation has completed, the needs_resize() callback is -called with a resize_type of RELAY_RESIZE_REPLACE. This function -doesn't replace the old buffer with the new - see -relay_replace_buffer(). - -This function is called by kernel clients in response to a -needs_resize() callback call with a resize type of RELAY_RESIZE_EXPAND -or RELAY_RESIZE_SHRINK. That callback also includes a suggested -new_bufsize and new_nbufs which should be used when calling this -function. - -Returns 0 on success, or errcode if the channel is busy or if -the allocation couldn't happen for some reason. - -NOTE: if async is not set, this function should not be called with a -lock held, as it may sleep. - ----------- -int relay_replace_buffer(channel_id) - -Replaces the current channel buffer with the new buffer allocated by -relay_realloc_buffer and contained in the channel struct. When the -replacement is complete, the needs_resize() callback is called with -RELAY_RESIZE_REPLACED. This function is called by kernel clients in -response to a needs_resize() callback having a resize type of -RELAY_RESIZE_REPLACE. - -Returns 0 on success, or errcode if the channel is busy or if the -replacement or previous allocation didn't happen for some reason. - -NOTE: This function will not sleep, so can called in any context and -with locks held. The client should, however, ensure that the channel -isn't actively being read from or written to. - ----------- -int relay_reset(rchan_id) - -relay_reset() has the effect of erasing all data from the buffer and -restarting the channel in its initial state. The buffer itself is not -freed, so any mappings are still in effect. NOTE: Care should be -taken that the channnel isn't actually being used by anything when -this call is made. - ----------- -int rchan_full(reader) - -returns 1 if the channel is full with respect to the reader, 0 if not. - ----------- -int rchan_empty(reader) - -returns 1 if the channel is empty with respect to the reader, 0 if not. - ----------- -int relay_discard_init_buf(rchan_id) - -allocates an mmappable channel buffer, copies the contents of init_buf -into it, and sets the current channel buffer to the newly allocated -buffer. This function is used only in conjunction with the init_buf -and init_buf_size params to relay_open(), and is typically used when -the ability to write into the channel at init-time is needed. The -basic usage is to specify an init_buf and init_buf_size to relay_open, -then call this function when it's safe to switch over to a normally -allocated channel buffer. 'Safe' means that the caller is in a -context that can sleep and that nothing is actively writing to the -channel. Returns 0 if successful, negative otherwise. - - -Writing directly into the channel -================================= - -Using the relay_write() API function as described above is the -preferred means of writing into a channel. In some cases, however, -in-kernel clients might want to write directly into a relay channel -rather than have relay_write() copy it into the buffer on the client's -behalf. Clients wishing to do this should follow the model used to -implement relay_write itself. The general sequence is: - -- get a pointer to the channel via rchan_get(). This increments the - channel's reference count. -- call relay_lock_channel(). This will perform the proper locking for - the channel given the scheme in use and the SMP usage. -- reserve a slot in the channel via relay_reserve() -- write directly to the reserved address -- call relay_commit() to commit the write -- call relay_unlock_channel() -- call rchan_put() to release the channel reference - -In particular, clients should make sure they call rchan_get() and -rchan_put() and not hold on to references to the channel pointer. -Also, forgetting to use relay_lock_channel()/relay_unlock_channel() -has no effect if the lockless scheme is being used, but could result -in corrupted buffer contents if the locking scheme is used. - - -Limitations -=========== - -Writes made via the write() system call are currently limited to 2 -pages worth of data. There is no such limit on the in-kernel API -function relay_write(). - -User applications can currently only mmap the complete buffer (it -doesn't really make sense to mmap only part of it, given its purpose). - - -Latest version -============== - -The latest version can be found at: - -http://www.opersys.com/relayfs -Example relayfs clients, such as dynamic printk and the Linux Trace -Toolkit, can also be found there. + channel management functions: + + relay_open(base_filename, parent, subbuf_size, n_subbufs, + callbacks) + relay_close(chan) + relay_flush(chan) + relay_reset(chan) + relayfs_create_dir(name, parent) + relayfs_remove_dir(dentry) + relayfs_create_file(name, parent, mode, fops, data) + relayfs_remove_file(dentry) + + channel management typically called on instigation of userspace: + + relay_subbufs_consumed(chan, cpu, subbufs_consumed) + + write functions: + + relay_write(chan, data, length) + __relay_write(chan, data, length) + relay_reserve(chan, length) + + callbacks: + + subbuf_start(buf, subbuf, prev_subbuf, prev_padding) + buf_mapped(buf, filp) + buf_unmapped(buf, filp) + create_buf_file(filename, parent, mode, buf, is_global) + remove_buf_file(dentry) + + helper functions: + + relay_buf_full(buf) + subbuf_start_reserve(buf, length) + + +Creating a channel +------------------ + +relay_open() is used to create a channel, along with its per-cpu +channel buffers. Each channel buffer will have an associated file +created for it in the relayfs filesystem, which can be opened and +mmapped from user space if desired. The files are named +basename0...basenameN-1 where N is the number of online cpus, and by +default will be created in the root of the filesystem. If you want a +directory structure to contain your relayfs files, you can create it +with relayfs_create_dir() and pass the parent directory to +relay_open(). Clients are responsible for cleaning up any directory +structure they create when the channel is closed - use +relayfs_remove_dir() for that. + +The total size of each per-cpu buffer is calculated by multiplying the +number of sub-buffers by the sub-buffer size passed into relay_open(). +The idea behind sub-buffers is that they're basically an extension of +double-buffering to N buffers, and they also allow applications to +easily implement random-access-on-buffer-boundary schemes, which can +be important for some high-volume applications. The number and size +of sub-buffers is completely dependent on the application and even for +the same application, different conditions will warrant different +values for these parameters at different times. Typically, the right +values to use are best decided after some experimentation; in general, +though, it's safe to assume that having only 1 sub-buffer is a bad +idea - you're guaranteed to either overwrite data or lose events +depending on the channel mode being used. + +Channel 'modes' +--------------- + +relayfs channels can be used in either of two modes - 'overwrite' or +'no-overwrite'. The mode is entirely determined by the implementation +of the subbuf_start() callback, as described below. In 'overwrite' +mode, also known as 'flight recorder' mode, writes continuously cycle +around the buffer and will never fail, but will unconditionally +overwrite old data regardless of whether it's actually been consumed. +In no-overwrite mode, writes will fail i.e. data will be lost, if the +number of unconsumed sub-buffers equals the total number of +sub-buffers in the channel. It should be clear that if there is no +consumer or if the consumer can't consume sub-buffers fast enought, +data will be lost in either case; the only difference is whether data +is lost from the beginning or the end of a buffer. + +As explained above, a relayfs channel is made of up one or more +per-cpu channel buffers, each implemented as a circular buffer +subdivided into one or more sub-buffers. Messages are written into +the current sub-buffer of the channel's current per-cpu buffer via the +write functions described below. Whenever a message can't fit into +the current sub-buffer, because there's no room left for it, the +client is notified via the subbuf_start() callback that a switch to a +new sub-buffer is about to occur. The client uses this callback to 1) +initialize the next sub-buffer if appropriate 2) finalize the previous +sub-buffer if appropriate and 3) return a boolean value indicating +whether or not to actually go ahead with the sub-buffer switch. + +To implement 'no-overwrite' mode, the userspace client would provide +an implementation of the subbuf_start() callback something like the +following: + +static int subbuf_start(struct rchan_buf *buf, + void *subbuf, + void *prev_subbuf, + unsigned int prev_padding) +{ + if (prev_subbuf) + *((unsigned *)prev_subbuf) = prev_padding; + + if (relay_buf_full(buf)) + return 0; + + subbuf_start_reserve(buf, sizeof(unsigned int)); + + return 1; +} + +If the current buffer is full i.e. all sub-buffers remain unconsumed, +the callback returns 0 to indicate that the buffer switch should not +occur yet i.e. until the consumer has had a chance to read the current +set of ready sub-buffers. For the relay_buf_full() function to make +sense, the consumer is reponsible for notifying relayfs when +sub-buffers have been consumed via relay_subbufs_consumed(). Any +subsequent attempts to write into the buffer will again invoke the +subbuf_start() callback with the same parameters; only when the +consumer has consumed one or more of the ready sub-buffers will +relay_buf_full() return 0, in which case the buffer switch can +continue. + +The implementation of the subbuf_start() callback for 'overwrite' mode +would be very similar: + +static int subbuf_start(struct rchan_buf *buf, + void *subbuf, + void *prev_subbuf, + unsigned int prev_padding) +{ + if (prev_subbuf) + *((unsigned *)prev_subbuf) = prev_padding; + + subbuf_start_reserve(buf, sizeof(unsigned int)); + + return 1; +} + +In this case, the relay_buf_full() check is meaningless and the +callback always returns 1, causing the buffer switch to occur +unconditionally. It's also meaningless for the client to use the +relay_subbufs_consumed() function in this mode, as it's never +consulted. + +The default subbuf_start() implementation, used if the client doesn't +define any callbacks, or doesn't define the subbuf_start() callback, +implements the simplest possible 'no-overwrite' mode i.e. it does +nothing but return 0. + +Header information can be reserved at the beginning of each sub-buffer +by calling the subbuf_start_reserve() helper function from within the +subbuf_start() callback. This reserved area can be used to store +whatever information the client wants. In the example above, room is +reserved in each sub-buffer to store the padding count for that +sub-buffer. This is filled in for the previous sub-buffer in the +subbuf_start() implementation; the padding value for the previous +sub-buffer is passed into the subbuf_start() callback along with a +pointer to the previous sub-buffer, since the padding value isn't +known until a sub-buffer is filled. The subbuf_start() callback is +also called for the first sub-buffer when the channel is opened, to +give the client a chance to reserve space in it. In this case the +previous sub-buffer pointer passed into the callback will be NULL, so +the client should check the value of the prev_subbuf pointer before +writing into the previous sub-buffer. + +Writing to a channel +-------------------- + +kernel clients write data into the current cpu's channel buffer using +relay_write() or __relay_write(). relay_write() is the main logging +function - it uses local_irqsave() to protect the buffer and should be +used if you might be logging from interrupt context. If you know +you'll never be logging from interrupt context, you can use +__relay_write(), which only disables preemption. These functions +don't return a value, so you can't determine whether or not they +failed - the assumption is that you wouldn't want to check a return +value in the fast logging path anyway, and that they'll always succeed +unless the buffer is full and no-overwrite mode is being used, in +which case you can detect a failed write in the subbuf_start() +callback by calling the relay_buf_full() helper function. + +relay_reserve() is used to reserve a slot in a channel buffer which +can be written to later. This would typically be used in applications +that need to write directly into a channel buffer without having to +stage data in a temporary buffer beforehand. Because the actual write +may not happen immediately after the slot is reserved, applications +using relay_reserve() can keep a count of the number of bytes actually +written, either in space reserved in the sub-buffers themselves or as +a separate array. See the 'reserve' example in the relay-apps tarball +at http://relayfs.sourceforge.net for an example of how this can be +done. Because the write is under control of the client and is +separated from the reserve, relay_reserve() doesn't protect the buffer +at all - it's up to the client to provide the appropriate +synchronization when using relay_reserve(). + +Closing a channel +----------------- + +The client calls relay_close() when it's finished using the channel. +The channel and its associated buffers are destroyed when there are no +longer any references to any of the channel buffers. relay_flush() +forces a sub-buffer switch on all the channel buffers, and can be used +to finalize and process the last sub-buffers before the channel is +closed. + +Creating non-relay files +------------------------ + +relay_open() automatically creates files in the relayfs filesystem to +represent the per-cpu kernel buffers; it's often useful for +applications to be able to create their own files alongside the relay +files in the relayfs filesystem as well e.g. 'control' files much like +those created in /proc or debugfs for similar purposes, used to +communicate control information between the kernel and user sides of a +relayfs application. For this purpose the relayfs_create_file() and +relayfs_remove_file() API functions exist. For relayfs_create_file(), +the caller passes in a set of user-defined file operations to be used +for the file and an optional void * to a user-specified data item, +which will be accessible via inode->u.generic_ip (see the relay-apps +tarball for examples). The file_operations are a required parameter +to relayfs_create_file() and thus the semantics of these files are +completely defined by the caller. + +See the relay-apps tarball at http://relayfs.sourceforge.net for +examples of how these non-relay files are meant to be used. + +Creating relay files in other filesystems +----------------------------------------- + +By default of course, relay_open() creates relay files in the relayfs +filesystem. Because relay_file_operations is exported, however, it's +also possible to create and use relay files in other pseudo-filesytems +such as debugfs. + +For this purpose, two callback functions are provided, +create_buf_file() and remove_buf_file(). create_buf_file() is called +once for each per-cpu buffer from relay_open() to allow the client to +create a file to be used to represent the corresponding buffer; if +this callback is not defined, the default implementation will create +and return a file in the relayfs filesystem to represent the buffer. +The callback should return the dentry of the file created to represent +the relay buffer. Note that the parent directory passed to +relay_open() (and passed along to the callback), if specified, must +exist in the same filesystem the new relay file is created in. If +create_buf_file() is defined, remove_buf_file() must also be defined; +it's responsible for deleting the file(s) created in create_buf_file() +and is called during relay_close(). + +The create_buf_file() implementation can also be defined in such a way +as to allow the creation of a single 'global' buffer instead of the +default per-cpu set. This can be useful for applications interested +mainly in seeing the relative ordering of system-wide events without +the need to bother with saving explicit timestamps for the purpose of +merging/sorting per-cpu files in a postprocessing step. + +To have relay_open() create a global buffer, the create_buf_file() +implementation should set the value of the is_global outparam to a +non-zero value in addition to creating the file that will be used to +represent the single buffer. In the case of a global buffer, +create_buf_file() and remove_buf_file() will be called only once. The +normal channel-writing functions e.g. relay_write() can still be used +- writes from any cpu will transparently end up in the global buffer - +but since it is a global buffer, callers should make sure they use the +proper locking for such a buffer, either by wrapping writes in a +spinlock, or by copying a write function from relayfs_fs.h and +creating a local version that internally does the proper locking. + +See the 'exported-relayfile' examples in the relay-apps tarball for +examples of creating and using relay files in debugfs. + +Misc +---- + +Some applications may want to keep a channel around and re-use it +rather than open and close a new channel for each use. relay_reset() +can be used for this purpose - it resets a channel to its initial +state without reallocating channel buffer memory or destroying +existing mappings. It should however only be called when it's safe to +do so i.e. when the channel isn't currently being written to. + +Finally, there are a couple of utility callbacks that can be used for +different purposes. buf_mapped() is called whenever a channel buffer +is mmapped from user space and buf_unmapped() is called when it's +unmapped. The client can use this notification to trigger actions +within the kernel application, such as enabling/disabling logging to +the channel. + + +Resources +========= + +For news, example code, mailing list, etc. see the relayfs homepage: + + http://relayfs.sourceforge.net Credits @@ -809,4 +439,4 @@ Karim Yaghmour Tom Zanussi Also thanks to Hubertus Franke for a lot of useful suggestions and bug -reports, and for contributing the klog code. +reports.