This commit was manufactured by cvs2svn to create branch 'vserver'.

author Planet-Lab Support <support@planet-lab.org>

Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)

committer Planet-Lab Support <support@planet-lab.org>

Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)
author Planet-Lab Support <support@planet-lab.org>
Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)
committer Planet-Lab Support <support@planet-lab.org>
Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt

new file mode 100644 (file)

index 0000000..7397bdb
--- /dev/null
+++ b/Documentation/filesystems/relayfs.txt
@@ -0,0 +1,812 @@
+
+relayfs - a high-speed data relay filesystem
+============================================
+
+relayfs is a filesystem designed to provide an efficient mechanism for
+tools and facilities to relay large amounts of data from kernel space
+to user space.
+
+The main idea behind relayfs is that every data flow is put into a
+separate "channel" and each channel is a file.  In practice, each
+channel is a separate memory buffer allocated from within kernel space
+upon channel instantiation. Software needing to relay data to user
+space would open a channel or a number of channels, depending on its
+needs, and would log data to that channel. All the buffering and
+locking mechanics are taken care of by relayfs.  The actual format and
+protocol used for each channel is up to relayfs' clients.
+
+relayfs makes no provisions for copying the same data to more than a
+single channel. This is for the clients of the relay to take care of,
+and so is any form of data filtering. The purpose is to keep relayfs
+as simple as possible.
+
+
+Usage
+=====
+
+In addition to the relayfs kernel API described below, relayfs
+implements basic file operations.  Here are the file operations that
+are available and some comments regarding their behavior:
+
+open()  enables user to open an _existing_ channel.  A channel can be
+        opened in blocking or non-blocking mode, and can be opened
+        for reading as well as for writing.  Readers will by default
+        be auto-consuming.
+
+mmap()  results in channel's memory buffer being mmapped into the
+        caller's memory space.
+
+read()  since we are dealing with circular buffers, the user is only
+        allowed to read forward.  Some apps may want to loop around
+        read() waiting for incoming data - if there is no data
+        available, read will put the reader on a wait queue until
+        data is available (blocking mode).  Non-blocking reads return
+        -EAGAIN if data is not available.
+
+
+write()         writing from user space operates exactly as relay_write() does
+        (described below).
+
+poll() POLLIN/POLLRDNORM/POLLOUT/POLLWRNORM/POLLERR supported.
+
+close()  decrements the channel's refcount.  When the refcount reaches
+        0 i.e. when no process or kernel client has the file open
+        (see relay_close() below), the channel buffer is freed.
+
+
+In order for a user application to make use of relayfs files, the
+relayfs filesystem must be mounted.  For example,
+
+       mount -t relayfs relayfs /mountpoint
+
+
+The relayfs kernel API
+======================
+
+relayfs channels are implemented as circular buffers subdivided into
+'sub-buffers'.  kernel clients write data into the channel using
+relay_write(), and are notified via a set of callbacks when
+significant events occur within the channel.  'Significant events'
+include:
+
+- a sub-buffer has been filled i.e. the current write won't fit into the
+  current sub-buffer, and a 'buffer-switch' is triggered, after which
+  the data is written into the next buffer (if the next buffer is
+  empty).  The client is notified of this condition via two callbacks,
+  one providing an opportunity to perform start-of-buffer tasks, the
+  other end-of-buffer tasks.
+
+- data is ready for the client to process.  The client can choose to
+  be notified either on a per-sub-buffer basis (bulk delivery) or
+  per-write basis (packet delivery).
+
+- data has been written to the channel from user space.  The client can
+  use this notification to accept and process 'commands' sent to the
+  channel via write(2).
+
+- the channel has been opened/closed/mapped/unmapped from user space.
+  The client can use this notification to trigger actions within the
+  kernel application, such as enabling/disabling logging to the
+  channel.  It can also return result codes from the callback,
+  indicating that the operation should fail e.g. in order to restrict
+  more than one user space open or mmap.
+
+- the channel needs resizing, or needs to update its
+  state based on the results of the resize.  Resizing the channel is
+  up to the kernel client to actually perform.  If the channel is
+  configured for resizing, the client is notified when the unread data
+  in the channel passes a preset threshold, giving it the opportunity
+  to allocate a new channel buffer and replace the old one.
+
+Reader objects
+--------------
+
+Channel readers use an opaque rchan_reader object to read from
+channels.  For VFS readers (those using read(2) to read from a
+channel), these objects are automatically created and used internally;
+only kernel clients that need to directly read from channels, or whose
+userspace applications use mmap to access channel data, need to know
+anything about rchan_readers - others may skip this section.
+
+A relay channel can have any number of readers, each represented by an
+rchan_reader instance, which is used to encapsulate reader settings
+and state.  rchan_reader objects should be treated as opaque by kernel
+clients.  To create a reader object for directly accessing a channel
+from kernel space, call the add_rchan_reader() kernel API function:
+
+rchan_reader *add_rchan_reader(rchan_id, auto_consume)
+
+This function returns an rchan_reader instance if successful, which
+should then be passed to relay_read() when the kernel client is
+interested in reading from the channel.
+
+The auto_consume parameter indicates whether a read done by this
+reader will automatically 'consume' that portion of the unread channel
+buffer when relay_read() is called (see below for more details).
+
+To close the reader, call
+
+remove_rchan_reader(reader)
+
+which will remove the reader from the list of current readers.
+
+
+To create a reader object representing a userspace mmap reader in the
+kernel application, call the add_map_reader() kernel API function:
+
+rchan_reader *add_map_reader(rchan_id)
+
+This function returns an rchan_reader instance if successful, whose
+main purpose is as an argument to be passed into
+relay_buffers_consumed() when the kernel client becomes aware that
+data has been read by a user application using mmap to read from the
+channel buffer.  There is no auto_consume option in this case, since
+only the kernel client/user application knows when data has been read.
+
+To close the map reader, call
+
+remove_map_reader(reader)
+
+which will remove the reader from the list of current readers.
+
+Consumed count
+--------------
+
+A relayfs channel is a circular buffer, which means that if there is
+no reader reading from it or a reader reading too slowly, at some
+point the channel writer will 'lap' the reader and data will be lost.
+In normal use, readers will always be able to keep up with writers and
+the buffer is thus never in danger of becoming full.  In many
+applications, it's sufficient to ensure that this is practically
+speaking always the case, by making the buffers large enough.  These
+types of applications can basically open the channel as
+RELAY_MODE_CONTINOUS (the default anyway) and not worry about the
+meaning of 'consume' and skip the rest of this section.
+
+If it's important for the application that a kernel client never allow
+writers to overwrite unread data, the channel should be opened using
+RELAY_MODE_NO_OVERWRITE and must be kept apprised of the count of
+bytes actually read by the (typically) user-space channel readers.
+This count is referred to as the 'consumed count'.  read(2) channel
+readers automatically update the channel's 'consumed count' as they
+read.  If the usage mode is to have only read(2) readers, which is
+typically the case, the kernel client doesn't need to worry about any
+of the relayfs functions having to do with 'bytes consumed' and can
+skip the rest of this section.  (Note that it is possible to have
+multiple read(2) or auto-consuming readers, but like having multiple
+readers on a pipe, these readers will race with each other i.e. it's
+supported, but doesn't make much sense).
+
+If the kernel client cannot rely on an auto-consuming reader to keep
+the 'consumed count' up-to-date, then it must do so manually, by
+making the appropriate calls to relay_buffers_consumed() or
+relay_bytes_consumed().  In most cases, this should only be necessary
+for bulk mmap clients - almost all packet clients should be covered by
+having auto-consuming read(2) readers.  For mmapped bulk clients, for
+instance, there are no auto-consuming VFS readers, so the kernel
+client needs to make the call to relay_buffers_consumed() after
+sub-buffers are read.
+
+Kernel API
+----------
+
+Here's a summary of the API relayfs provides to in-kernel clients:
+
+int    relay_open(channel_path, bufsize, nbufs, channel_flags,
+                 channel_callbacks, start_reserve, end_reserve,
+                 rchan_start_reserve, resize_min, resize_max, mode,
+                 init_buf, init_buf_size)
+int    relay_write(channel_id, *data_ptr, count, time_delta_offset, **wrote)
+rchan_reader *add_rchan_reader(channel_id, auto_consume)
+int    remove_rchan_reader(rchan_reader *reader)
+rchan_reader *add_map_reader(channel_id)
+int    remove_map_reader(rchan_reader *reader)
+int    relay_read(reader, buf, count, wait, *actual_read_offset)
+void   relay_buffers_consumed(reader, buffers_consumed)
+void   relay_bytes_consumed(reader, bytes_consumed, read_offset)
+int    relay_bytes_avail(reader)
+int    rchan_full(reader)
+int    rchan_empty(reader)
+int    relay_info(channel_id, *channel_info)
+int    relay_close(channel_id)
+int    relay_realloc_buffer(channel_id, nbufs, async)
+int    relay_replace_buffer(channel_id)
+int    relay_reset(int rchan_id)
+
+----------
+int relay_open(channel_path, bufsize, nbufs, 
+        channel_flags, channel_callbacks, start_reserve,
+        end_reserve, rchan_start_reserve, resize_min, resize_max, mode)
+
+relay_open() is used to create a new entry in relayfs.  This new entry
+is created according to channel_path.  channel_path contains the
+absolute path to the channel file on relayfs.  If, for example, the
+caller sets channel_path to "/xlog/9", a "xlog/9" entry will appear
+within relayfs automatically and the "xlog" directory will be created
+in the filesystem's root.  relayfs does not implement any policy on
+its content, except to disallow the opening of two channels using the
+same file. There are, nevertheless a set of guidelines for using
+relayfs. Basically, each facility using relayfs should use a top-level
+directory identifying it. The entry created above, for example,
+presumably belongs to the "xlog" software.
+
+The remaining parameters for relay_open() are as follows:
+
+- channel_flags - an ORed combination of attribute values controlling
+  common channel characteristics:
+
+       - logging scheme - relayfs use 2 mutually exclusive schemes
+         for logging data to a channel.  The 'lockless scheme'
+         reserves and writes data to a channel without the need of
+         any type of locking on the channel.  This is the preferred
+         scheme, but may not be available on a given architecture (it
+         relies on the presence of a cmpxchg instruction).  It's
+         specified by the RELAY_SCHEME_LOCKLESS flag.  The 'locking
+         scheme' either obtains a lock on the channel for writing or
+         disables interrupts, depending on whether the channel was
+         opened for SMP or global usage (see below).  It's specified
+         by the RELAY_SCHEME_LOCKING flag.  While a client may want
+         to explicitly specify a particular scheme to use, it's more
+         convenient to specify RELAY_SCHEME_ANY for this flag, which
+         will allow relayfs to choose the best available scheme i.e.
+         lockless if supported.
+
+       - overwrite mode (default is RELAY_MODE_CONTINUOUS) -
+        If RELAY_MODE_CONTINUOUS is specified, writes to the channel
+        will succeed regardless of whether there are up-to-date
+        consumers or not.  If RELAY_MODE_NO_OVERWRITE is specified,
+        the channel becomes 'full' when the total amount of buffer
+        space unconsumed by readers equals or exceeds the total
+        buffer size.  With the buffer in this state, writes to the
+        buffer will fail - clients need to check the return code from
+        relay_write() to determine if this is the case and act
+        accordingly - 0 or a negative value indicate the write failed.
+
+       - SMP usage - this applies only when the locking scheme is in
+        use.  If RELAY_USAGE_SMP is specified, it's assumed that the
+        channel will be used in a per-CPU fashion and consequently,
+        the only locking that will be done for writes is to disable
+        local irqs.  If RELAY_USAGE_GLOBAL is specified, it's assumed
+        that writes to the buffer can occur within any CPU context,
+        and spinlock_irq_save will be used to lock the buffer.
+
+       - delivery mode - if RELAY_DELIVERY_BULK is specified, the
+        client will be notified via its deliver() callback whenever a
+        sub-buffer has been filled.  Alternatively,
+        RELAY_DELIVERY_PACKET will cause delivery to occur after the
+        completion of each write.  See the description of the channel
+        callbacks below for more details.
+
+       - timestamping - if RELAY_TIMESTAMP_TSC is specified and the
+        architecture supports it, efficient TSC 'timestamps' can be
+        associated with each write, otherwise more expensive
+        gettimeofday() timestamping is used.  At the beginning of
+        each sub-buffer, a gettimeofday() timestamp and the current
+        TSC, if supported, are read, and are passed on to the client
+        via the buffer_start() callback.  This allows correlation of
+        the current time with the current TSC for subsequent writes.
+        Each subsequent write is associated with a 'time delta',
+        which is either the current TSC, if the channel is using
+        TSCs, or the difference between the buffer_start gettimeofday
+        timestamp and the gettimeofday time read for the current
+        write.  Note that relayfs never writes either a timestamp or
+        time delta into the buffer unless explicitly asked to (see
+        the description of relay_write() for details).
+ 
+- bufsize - the size of the 'sub-buffers' making up the circular channel
+  buffer.  For the lockless scheme, this must be a power of 2.
+
+- nbufs - the number of 'sub-buffers' making up the circular
+  channel buffer.  This must be a power of 2.
+
+  The total size of the channel buffer is bufsize * nbufs rounded up 
+  to the next kernel page size.  If the lockless scheme is used, both
+  bufsize and nbufs must be a power of 2.  If the locking scheme is
+  used, the bufsize can be anything and nbufs must be a power of 2.  If
+  RELAY_SCHEME_ANY is used, the bufsize and nbufs should be a power of 2.
+
+  NOTE: if nbufs is 1, relayfs will bypass the normal size
+  checks and will allocate an rvmalloced buffer of size bufsize.
+  This buffer will be freed when relay_close() is called, if the channel
+  isn't still being referenced.
+
+- callbacks - a table of callback functions called when events occur
+  within the data relay that clients need to know about:
+          
+         - int buffer_start(channel_id, current_write_pos, buffer_id,
+           start_time, start_tsc, using_tsc) -
+
+           called at the beginning of a new sub-buffer, the
+           buffer_start() callback gives the client an opportunity to
+           write data into space reserved at the beginning of a
+           sub-buffer.  The client should only write into the buffer
+           if it specified a value for start_reserve and/or
+           channel_start_reserve (see below) when the channel was
+           opened.  In the latter case, the client can determine
+           whether to write its one-time rchan_start_reserve data by
+           examining the value of buffer_id, which will be 0 for the
+           first sub-buffer.  The address that the client can write
+           to is contained in current_write_pos (the client by
+           definition knows how much it can write i.e. the value it
+           passed to relay_open() for start_reserve/
+           channel_start_reserve).  start_time contains the
+           gettimeofday() value for the start of the buffer and start
+           TSC contains the TSC read at the same time.  The using_tsc
+           param indicates whether or not start_tsc is valid (it
+           wouldn't be if TSC timestamping isn't being used).
+
+           The client should return the number of bytes it wrote to
+           the channel, 0 if none.
+
+         - int buffer_end(channel_id, current_write_pos, end_of_buffer,
+           end_time, end_tsc, using_tsc)
+
+           called at the end of a sub-buffer, the buffer_end()
+           callback gives the client an opportunity to perform
+           end-of-buffer processing.  Note that the current_write_pos
+           is the position where the next write would occur, but
+           since the current write wouldn't fit (which is the trigger
+           for the buffer_end event), the buffer is considered full
+           even though there may be unused space at the end.  The
+           end_of_buffer param pointer value can be used to determine
+           exactly the size of the unused space.  The client should
+           only write into the buffer if it specified a value for
+           end_reserve when the channel was opened.  If the client
+           doesn't write anything i.e. returns 0, the unused space at
+           the end of the sub-buffer is available via relay_info() -
+           this data may be needed by the client later if it needs to
+           process raw sub-buffers (an alternative would be to save
+           the unused bytes count value in end_reserve space at the
+           end of each sub-buffer during buffer_end processing and
+           read it when needed at a later time.  The other
+           alternative would be to use read(2), which makes the
+           unused count invisible to the caller).  end_time contains
+           the gettimeofday() value for the end of the buffer and end
+           TSC contains the TSC read at the same time.  The using_tsc
+           param indicates whether or not end_tsc is valid (it
+           wouldn't be if TSC timestamping isn't being used).
+
+           The client should return the number of bytes it wrote to
+           the channel, 0 if none.
+
+         - void deliver(channel_id, from, len)
+
+           called when data is ready for the client.  This callback
+           is used to notify a client when a sub-buffer is complete
+           (in the case of bulk delivery) or a single write is
+           complete (packet delivery).  A bulk delivery client might
+           wish to then signal a daemon that a sub-buffer is ready.
+           A packet delivery client might wish to process the packet
+           or send it elsewhere.  The from param is a pointer to the
+           delivered data and len specifies how many bytes are ready.
+
+         - void user_deliver(channel_id, from, len)
+
+           called when data has been written to the channel from user
+           space.  This callback is used to notify a client when a
+           successful write from userspace has occurred, independent
+           of whether bulk or packet delivery is in use.  This can be
+           used to allow userspace programs to communicate with the
+           kernel client through the channel via out-of-band write(2)
+           'commands' instead of via ioctls, for instance.  The from
+           param is a pointer to the delivered data and len specifies
+           how many bytes are ready.  Note that this callback occurs
+           after the bytes have been successfully written into the
+           channel, which means that channel readers must be able to
+           deal with the 'command' data which will appear in the
+           channel data stream just as any other userspace or
+           non-userspace write would.
+
+         - int needs_resize(channel_id, resize_type,
+                            suggested_buf_size, suggested_n_bufs)
+
+           called when a channel's buffers are in danger of becoming
+           full i.e. the number of unread bytes in the channel passes
+           a preset threshold, or when the current capacity of a
+           channel's buffer is no longer needed.  Also called to
+           notify the client when a channel's buffer has been
+           replaced.  If resize_type is RELAY_RESIZE_EXPAND or
+           RELAY_RESIZE_SHRINK, the kernel client should arrange to
+           call relay_realloc_buffer() with the suggested buffer size
+           and buffer count, which will allocate (but will not
+           replace the old one) a new buffer of the recommended size
+           for the channel.  When the allocation has completed,
+           needs_resize() is again called, this time with a
+           resize_type of RELAY_RESIZE_REPLACE.  The kernel client
+           should then arrange to call relay_replace_buffer() to
+           actually replace the old channel buffer with the newly
+           allocated buffer.  Finally, once the buffer replacement
+           has completed, needs_resize() is again called, this time
+           with a resize_type of RELAY_RESIZE_REPLACED, to inform the
+           client that the replacement is complete and additionally
+           confirming the current sub-buffer size and number of
+           sub-buffers.  Note that a resize can be canceled if
+           relay_realloc_buffer() is called with the async param
+           non-zero and the resize conditions no longer hold.  In
+           this case, the RELAY_RESIZE_REPLACED suggested number of
+           sub-buffers will be the same as the number of sub-buffers
+           that existed before the RELAY_RESIZE_SHRINK or EXPAND i.e.
+           values indicating that the resize didn't actually occur.
+
+         - int fileop_notify(channel_id, struct file *filp, enum relay_fileop)
+
+           called when a userspace file operation has occurred or
+           will occur on a relayfs channel file.  These notifications
+           can be used by the kernel client to trigger actions within
+           the kernel client when the corresponding event occurs,
+           such as enabling logging only when a userspace application
+           opens or mmaps a relayfs file and disabling it again when
+           the file is closed or unmapped.  The kernel client can
+           also return its own return value, which can affect the
+           outcome of file operation - returning 0 indicates that the
+           operation should succeed, and returning a negative value
+           indicates that the operation should be failed, and that
+           the returned value should be returned to the ultimate
+           caller e.g. returning -EPERM from the open fileop will
+           cause the open to fail with -EPERM.  Among other things,
+           the return value can be used to restrict a relayfs file
+           from being opened or mmap'ed more than once.  The currently
+           implemented fileops are:
+
+           RELAY_FILE_OPEN - a relayfs file is being opened.  Return
+                             0 to allow it to succeed, negative to
+                             have it fail.  A negative return value will
+                             be passed on unmodified to the open fileop.
+           RELAY_FILE_CLOSE- a relayfs file is being closed.  The return
+                             value is ignored.
+           RELAY_FILE_MAP - a relayfs file is being mmap'ed.  Return 0
+                            to allow it to succeed, negative to have
+                            it fail.  A negative return value will be
+                            passed on unmodified to the mmap fileop.
+           RELAY_FILE_UNMAP- a relayfs file is being unmapped.  The return
+                             value is ignored.
+
+         - void ioctl(rchan_id, cmd, arg)
+
+           called when an ioctl call is made using a relayfs file
+           descriptor.  The cmd and arg are passed along to this
+           callback unmodified for it to do as it wishes with.  The
+           return value from this callback is used as the return value
+           of the ioctl call.
+
+  If the callbacks param passed to relay_open() is NULL, a set of
+  default do-nothing callbacks will be defined for the channel.
+  Likewise, any NULL rchan_callback function contained in a non-NULL
+  callbacks struct will be filled in with a default callback function
+  that does nothing.
+
+- start_reserve - the number of bytes to be reserved at the start of
+  each sub-buffer.  The client can do what it wants with this number
+  of bytes when the buffer_start() callback is invoked.  Typically
+  clients would use this to write per-sub-buffer header data.
+
+- end_reserve - the number of bytes to be reserved at the end of each
+  sub-buffer.  The client can do what it wants with this number of
+  bytes when the buffer_end() callback is invoked.  Typically clients
+  would use this to write per-sub-buffer footer data.
+
+- channel_start_reserve - the number of bytes to be reserved, in
+  addition to start_reserve, at the beginning of the first sub-buffer
+  in the channel.  The client can do what it wants with this number of
+  bytes when the buffer_start() callback is invoked.  Typically
+  clients would use this to write per-channel header data.
+
+- resize_min - if set, this signifies that the channel is
+  auto-resizeable.  The value specifies the size that the channel will
+  try to maintain as a normal working size, and that it won't go
+  below.  The client makes use of the resizing callbacks and
+  relay_realloc_buffer() and relay_replace_buffer() to actually effect
+  the resize.
+
+- resize_max - if set, this signifies that the channel is
+  auto-resizeable.  The value specifies the maximum size the channel
+  can have as a result of resizing.
+
+- mode - if non-zero, specifies the file permissions that will be given
+  to the channel file.  If 0, the default rw user perms will be used.
+
+- init_buf - if non-NULL, rather than allocating the channel buffer,
+  this buffer will be used as the initial channel buffer.  The kernel
+  API function relay_discard_init_buf() can later be used to have
+  relayfs allocate a normal mmappable channel buffer and switch over
+  to using it after copying the init_buf contents into it.  Currently,
+  the size of init_buf must be exactly buf_size * n_bufs.  The caller
+  is responsible for managing the init_buf memory.  This feature is
+  typically used for init-time channel use and should normally be
+  specified as NULL.
+
+- init_buf_size - the total size of init_buf, if init_buf is specified
+  as non-NULL.  Currently, the size of init_buf must be exactly
+  buf_size * n_bufs.
+
+Upon successful completion, relay_open() returns a channel id
+to be used for all other operations with the relay. All buffers
+managed by the relay are allocated using rvmalloc/rvfree to allow
+for easy mmapping to user-space.
+
+----------
+int relay_write(channel_id, *data_ptr, count, time_delta_offset, **wrote_pos)
+
+relay_write() reserves space in the channel and writes count bytes of
+data pointed to by data_ptr to it.  Automatically performs any
+necessary locking, depending on the scheme and SMP usage in effect (no
+locking is done for the lockless scheme regardless of usage).  It
+returns the number of bytes written, or 0/negative on failure.  If
+time_delta_offset is >= 0, the internal time delta, the internal time
+delta calculated when the slot was reserved will be written at that
+offset.  This is the TSC or gettimeofday() delta between the current
+write and the beginning of the buffer, whichever method is being used
+by the channel.  Trying to write a count larger than the bufsize
+specified to relay_open() (taking into account the reserved
+start-of-buffer and end-of-buffer space as well) will fail.  If
+wrote_pos is non-NULL, it will receive the location the data was
+written to, which may be needed for some applications but is not
+normally interesting.  Most applications should pass in NULL for this
+param.
+
+----------
+struct rchan_reader *add_rchan_reader(int rchan_id, int auto_consume)
+
+add_rchan_reader creates and initializes a reader object for a
+channel.  An opaque rchan_reader object is returned on success, and is
+passed to relay_read() when reading the channel.  If the boolean
+auto_consume parameter is 1, the reader is defined to be
+auto-consuming.  auto-consuming reader objects are automatically
+created and used for VFS read(2) readers.
+
+----------
+void remove_rchan_reader(struct rchan_reader *reader)
+
+remove_rchan_reader finds and removes the given reader from the
+channel.  This function is used only by non-VFS read(2) readers.  VFS
+read(2) readers are automatically removed when the corresponding file
+object is closed.
+
+----------
+reader add_map_reader(int rchan_id)
+
+Creates and initializes an rchan_reader object for channel map
+readers, and is needed for updating relay_bytes/buffers_consumed()
+when kernel clients become aware of the need to do so by their mmap
+user clients.
+
+----------
+int remove_map_reader(reader)
+
+Finds and removes the given map reader from the channel.  This function
+is useful only for map readers.
+
+----------
+int relay_read(reader, buf, count, wait, *actual_read_offset)
+
+Reads count bytes from the channel, or as much as is available within
+the sub-buffer currently being read.  The read offset that will be
+read from is the position contained within the reader object.  If the
+wait flag is set, buf is non-NULL, and there is nothing available, it
+will wait until there is.  If the wait flag is 0 and there is nothing
+available, -EAGAIN is returned.  If buf is NULL, the value returned is
+the number of bytes that would have been read.  actual_read_offset is
+the value that should be passed as the read offset to
+relay_bytes_consumed, needed only if the reader is not auto-consuming
+and the channel is MODE_NO_OVERWRITE, but in any case, it must not be
+NULL.
+
+---------- 
+
+int relay_bytes_avail(reader)
+
+Returns the number of bytes available relative to the reader's current
+read position within the corresponding sub-buffer, 0 if there is
+nothing available.  Note that this doesn't return the total bytes
+available in the channel buffer - this is enough though to know if
+anything is available, however, or how many bytes might be returned
+from the next read.
+
+----------
+void relay_buffers_consumed(reader, buffers_consumed)
+
+Adds to the channel's consumed buffer count.  buffers_consumed should
+be the number of buffers newly consumed, not the total number
+consumed.  NOTE: kernel clients don't need to call this function if
+the reader is auto-consuming or the channel is MODE_CONTINUOUS.
+
+In order for the relay to detect the 'buffers full' condition for a
+channel, it must be kept up-to-date with respect to the number of
+buffers consumed by the client.  If the addition of the value of the
+bufs_consumed param to the current bufs_consumed count for the channel
+would exceed the bufs_produced count for the channel, the channel's
+bufs_consumed count will be set to the bufs_produced count for the
+channel.  This allows clients to 'catch up' if necessary.
+
+----------
+void relay_bytes_consumed(reader, bytes_consumed, read_offset)
+
+Adds to the channel's consumed count.  bytes_consumed should be the
+number of bytes actually read e.g. return value of relay_read() and
+the read_offset should be the actual offset the bytes were read from
+e.g. the actual_read_offset set by relay_read().  NOTE: kernel clients
+don't need to call this function if the reader is auto-consuming or
+the channel is MODE_CONTINUOUS.
+
+In order for the relay to detect the 'buffers full' condition for a
+channel, it must be kept up-to-date with respect to the number of
+bytes consumed by the client.  For packet clients, it makes more sense
+to update after each read rather than after each complete sub-buffer
+read.  The bytes_consumed count updates bufs_consumed when a buffer
+has been consumed so this count remains consistent.
+
+----------
+int relay_info(channel_id, *channel_info)
+
+relay_info() fills in an rchan_info struct with channel status and
+attribute information such as usage modes, sub-buffer size and count,
+the allocated size of the entire buffer, buffers produced and
+consumed, current buffer id, count of writes lost due to buffers full
+condition.
+
+The virtual address of the channel buffer is also available here, for
+those clients that need it.
+
+Clients may need to know how many 'unused' bytes there are at the end
+of a given sub-buffer.  This would only be the case if the client 1)
+didn't either write this count to the end of the sub-buffer or
+otherwise note it (it's available as the difference between the buffer
+end and current write pos params in the buffer_end callback) (if the
+client returned 0 from the buffer_end callback, it's assumed that this
+is indeed the case) 2) isn't using the read() system call to read the
+buffer.  In other words, if the client isn't annotating the stream and
+is reading the buffer by mmaping it, this information would be needed
+in order for the client to 'skip over' the unused bytes at the ends of
+sub-buffers.
+
+Additionally, for the lockless scheme, clients may need to know
+whether a particular sub-buffer is actually complete.  An array of
+boolean values, one per sub-buffer, contains non-zero if the buffer is
+complete, non-zero otherwise.
+
+----------
+int relay_close(channel_id)
+
+relay_close() is used to close the channel.  It finalizes the last
+sub-buffer (the one currently being written to) and marks the channel
+as finalized.  The channel buffer and channel data structure are then
+freed automatically when the last reference to the channel is given
+up.
+
+----------
+int relay_realloc_buffer(channel_id, nbufs, async)
+
+Allocates a new channel buffer using the specified sub-buffer count
+(note that resizing can't change sub-buffer sizes).  If async is
+non-zero, the allocation is done in the background using a work queue.
+When the allocation has completed, the needs_resize() callback is
+called with a resize_type of RELAY_RESIZE_REPLACE.  This function
+doesn't replace the old buffer with the new - see
+relay_replace_buffer().
+
+This function is called by kernel clients in response to a
+needs_resize() callback call with a resize type of RELAY_RESIZE_EXPAND
+or RELAY_RESIZE_SHRINK.  That callback also includes a suggested
+new_bufsize and new_nbufs which should be used when calling this
+function.
+
+Returns 0 on success, or errcode if the channel is busy or if
+the allocation couldn't happen for some reason.
+
+NOTE: if async is not set, this function should not be called with a
+lock held, as it may sleep.
+
+----------
+int relay_replace_buffer(channel_id)
+
+Replaces the current channel buffer with the new buffer allocated by
+relay_realloc_buffer and contained in the channel struct.  When the
+replacement is complete, the needs_resize() callback is called with
+RELAY_RESIZE_REPLACED.  This function is called by kernel clients in
+response to a needs_resize() callback having a resize type of
+RELAY_RESIZE_REPLACE.
+
+Returns 0 on success, or errcode if the channel is busy or if the
+replacement or previous allocation didn't happen for some reason.
+
+NOTE: This function will not sleep, so can called in any context and
+with locks held.  The client should, however, ensure that the channel
+isn't actively being read from or written to.
+
+----------
+int relay_reset(rchan_id)
+
+relay_reset() has the effect of erasing all data from the buffer and
+restarting the channel in its initial state.  The buffer itself is not
+freed, so any mappings are still in effect.  NOTE: Care should be
+taken that the channnel isn't actually being used by anything when
+this call is made.
+
+----------
+int rchan_full(reader)
+
+returns 1 if the channel is full with respect to the reader, 0 if not.
+
+----------
+int rchan_empty(reader)
+
+returns 1 if the channel is empty with respect to the reader, 0 if not.
+
+----------
+int relay_discard_init_buf(rchan_id)
+
+allocates an mmappable channel buffer, copies the contents of init_buf
+into it, and sets the current channel buffer to the newly allocated
+buffer.  This function is used only in conjunction with the init_buf
+and init_buf_size params to relay_open(), and is typically used when
+the ability to write into the channel at init-time is needed.  The
+basic usage is to specify an init_buf and init_buf_size to relay_open,
+then call this function when it's safe to switch over to a normally
+allocated channel buffer.  'Safe' means that the caller is in a
+context that can sleep and that nothing is actively writing to the
+channel.  Returns 0 if successful, negative otherwise.
+
+
+Writing directly into the channel
+=================================
+
+Using the relay_write() API function as described above is the
+preferred means of writing into a channel.  In some cases, however,
+in-kernel clients might want to write directly into a relay channel
+rather than have relay_write() copy it into the buffer on the client's
+behalf.  Clients wishing to do this should follow the model used to
+implement relay_write itself.  The general sequence is:
+
+- get a pointer to the channel via rchan_get().  This increments the
+  channel's reference count.
+- call relay_lock_channel().  This will perform the proper locking for
+  the channel given the scheme in use and the SMP usage.
+- reserve a slot in the channel via relay_reserve()
+- write directly to the reserved address
+- call relay_commit() to commit the write
+- call relay_unlock_channel()
+- call rchan_put() to release the channel reference
+
+In particular, clients should make sure they call rchan_get() and
+rchan_put() and not hold on to references to the channel pointer.
+Also, forgetting to use relay_lock_channel()/relay_unlock_channel()
+has no effect if the lockless scheme is being used, but could result
+in corrupted buffer contents if the locking scheme is used.
+
+
+Limitations
+===========
+
+Writes made via the write() system call are currently limited to 2
+pages worth of data.  There is no such limit on the in-kernel API
+function relay_write().
+
+User applications can currently only mmap the complete buffer (it
+doesn't really make sense to mmap only part of it, given its purpose).
+
+
+Latest version
+==============
+
+The latest version can be found at:
+
+http://www.opersys.com/relayfs
+
+Example relayfs clients, such as dynamic printk and the Linux Trace
+Toolkit, can also be found there.
+
+
+Credits
+=======
+
+The ideas and specs for relayfs came about as a result of discussions
+on tracing involving the following:
+
+Michel Dagenais                <michel.dagenais@polymtl.ca>
+Richard Moore          <richardj_moore@uk.ibm.com>
+Bob Wisniewski         <bob@watson.ibm.com>
+Karim Yaghmour         <karim@opersys.com>
+Tom Zanussi            <zanussi@us.ibm.com>
+
+Also thanks to Hubertus Franke for a lot of useful suggestions and bug
+reports, and for contributing the klog code.
author	Planet-Lab Support <support@planet-lab.org>
	Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)
committer	Planet-Lab Support <support@planet-lab.org>
	Wed, 16 Jun 2004 18:16:02 +0000 (18:16 +0000)