X-Git-Url: http://git.onelab.eu/?a=blobdiff_plain;f=Documentation%2Futrace.txt;fp=Documentation%2Futrace.txt;h=73858ed6eb0e1fcb9a70f48584439f50f05db76b;hb=f05f9504c50ed069377d37f02f22e7a16b5921de;hp=0000000000000000000000000000000000000000;hpb=16c70f8c1b54b61c3b951b6fb220df250fe09b32;p=linux-2.6.git diff --git a/Documentation/utrace.txt b/Documentation/utrace.txt new file mode 100644 index 000000000..73858ed6e --- /dev/null +++ b/Documentation/utrace.txt @@ -0,0 +1,455 @@ +DRAFT DRAFT DRAFT WORK IN PROGRESS DRAFT DRAFT DRAFT + +This is work in progress and likely to change. + + + Roland McGrath + +--- + + User Debugging Data & Event Rendezvous + ---- --------- ---- - ----- ---------- + +See linux/utrace.h for all the declarations used here. +See also linux/tracehook.h for the utrace_regset declarations. + +The UTRACE is infrastructure code for tracing and controlling user +threads. This is the foundation for writing tracing engines, which +can be loadable kernel modules. The UTRACE interfaces provide three +basic facilities: + +* Thread event reporting + + Tracing engines can request callbacks for events of interest in + the thread: signals, system calls, exit, exec, clone, etc. + +* Core thread control + + Tracing engines can prevent a thread from running (keeping it in + TASK_TRACED state), or make it single-step or block-step (when + hardware supports it). Engines can cause a thread to abort system + calls, they change the behaviors of signals, and they can inject + signal-style actions at will. + +* Thread machine state access + + Tracing engines can read and write a thread's registers and + similar per-thread CPU state. + + + Tracing engines + ------- ------- + +The basic actors in UTRACE are the thread and the tracing engine. +A tracing engine is some body of code that calls into the utrace_* +interfaces, represented by a struct utrace_engine_ops. (Usually it's a +kernel module, though the legacy ptrace support is a tracing engine +that is not in a kernel module.) The UTRACE interface operates on +individual threads (struct task_struct). If an engine wants to +treat several threads as a group, that is up to its higher-level +code. Using the UTRACE starts out by attaching an engine to a thread. + + struct utrace_attached_engine * + utrace_attach(struct task_struct *target, int flags, + const struct utrace_engine_ops *ops, unsigned long data); + +Calling utrace_attach is what sets up a tracing engine to trace a +thread. Use UTRACE_ATTACH_CREATE in flags, and pass your engine's ops. +Check the return value with IS_ERR. If successful, it returns a +struct pointer that is the handle used in all other utrace_* calls. +The data argument is stored in the utrace_attached_engine structure, +for your code to use however it wants. + + void utrace_detach(struct task_struct *target, + struct utrace_attached_engine *engine); + +The utrace_detach call removes an engine from a thread. +No more callbacks will be made after this returns. + + +An attached engine does nothing by default. +An engine makes something happen by setting its flags. + + void utrace_set_flags(struct task_struct *target, + struct utrace_attached_engine *engine, + unsigned long flags); + + + Action Flags + ------ ----- + +There are two kinds of flags that an attached engine can set: event +flags, and action flags. Event flags register interest in particular +events; when an event happens and an engine has the right event flag +set, it gets a callback. Action flags change the normal behavior of +the thread. The action flags available are: + + UTRACE_ACTION_QUIESCE + + The thread will stay quiescent (see below). + As long as any engine asserts the QUIESCE action flag, + the thread will not resume running in user mode. + (Usually it will be in TASK_TRACED state.) + Nothing will wake the thread up except for SIGKILL + (and implicit SIGKILLs such as a core dump in + another thread sharing the same address space, or a + group exit or fatal signal in another thread in the + same thread group). + + UTRACE_ACTION_SINGLESTEP + + When the thread runs, it will run one instruction + and then trap. (Exiting a system call or entering a + signal handler is considered "an instruction" for this.) + This can be used only if ARCH_HAS_SINGLE_STEP #define'd + by and evaluates to nonzero. + + UTRACE_ACTION_BLOCKSTEP + + When the thread runs, it will run until the next branch, + and then trap. (Exiting a system call or entering a + signal handler is considered a branch for this.) + When the SINGLESTEP flag is set, BLOCKSTEP has no effect. + This is only available on some machines (actually none yet). + This can be used only if ARCH_HAS_BLOCK_STEP #define'd + by and evaluates to nonzero. + + UTRACE_ACTION_NOREAP + + When the thread exits or stops for job control, its + parent process will not receive a SIGCHLD and the + parent's wait calls will not wake up or report the + child as dead. A well-behaved tracing engine does not + want to interfere with the parent's normal notifications. + This is provided mainly for the ptrace compatibility + code to implement the traditional behavior. + +Event flags are specified using the macro UTRACE_EVENT(TYPE). +Each event type is associated with a report_* callback in struct +utrace_engine_ops. A tracing engine can leave unused callbacks NULL. +The only callbacks required are those used by the event flags it sets. + +Many engines can be attached to each thread. When a thread has an +event, each engine gets a report_* callback if it has set the event flag +for that event type. Engines are called in the order they attached. + +Each callback takes arguments giving the details of the particular +event. The first two arguments two every callback are the struct +utrace_attached_engine and struct task_struct pointers for the engine +and the thread producing the event. Usually this will be the current +thread that is running the callback functions. + +The return value of report_* callbacks is a bitmask. Some bits are +common to all callbacks, and some are particular to that callback and +event type. The value zero (UTRACE_ACTION_RESUME) always means the +simplest thing: do what would have happened with no tracing engine here. +These are the flags that can be set in any report_* return value: + + UTRACE_ACTION_NEWSTATE + + Update the action state flags, described above. Those + bits from the return value (UTRACE_ACTION_STATE_MASK) + replace those bits in the engine's flags. This has the + same effect as calling utrace_set_flags, but is a more + efficient short-cut. To change the event flags, you must + call utrace_set_flags. + + UTRACE_ACTION_DETACH + + Detach this engine. This has the effect of calling + utrace_detach, but is a more efficient short-cut. + + UTRACE_ACTION_HIDE + + Hide this event from other tracing engines. This is + only appropriate to do when the event was induced by + some action of this engine, such as a breakpoint trap. + Some events cannot be hidden, since every engine has to + know about them: exit, death, reap. + +The return value bits in UTRACE_ACTION_OP_MASK indicate a change to the +normal behavior of the event taking place. If zero, the thread does +whatever that event normally means. For report_signal, other values +control the disposition of the signal. + + + Quiescence + ---------- + +To control another thread and access its state, it must be "quiescent". +This means that it is stopped and won't start running again while we access +it. A quiescent thread is stopped in a place close to user mode, where the +user state can be accessed safely; either it's about to return to user +mode, or it's just entered the kernel from user mode, or it has already +finished exiting (TASK_ZOMBIE). Setting the UTRACE_ACTION_QUIESCE action +flag will force the attached thread to become quiescent soon. After +setting the flag, an engine must wait for an event callback when the thread +becomes quiescent. The thread may be running on another CPU, or may be in +an uninterruptible wait. When it is ready to be examined, it will make +callbacks to engines that set the UTRACE_EVENT(QUIESCE) event flag. + +As long as some engine has UTRACE_ACTION_QUIESCE set, then the thread will +remain stopped. SIGKILL will wake it up, but it will not run user code. +When the flag is cleared via utrace_set_flags or a callback return value, +the thread starts running again. + +During the event callbacks (report_*), the thread in question makes the +callback from a safe place. It is not quiescent, but it can safely access +its own state. Callbacks can access thread state directly without setting +the QUIESCE action flag. If a callback does want to prevent the thread +from resuming normal execution, it *must* use the QUIESCE action state +rather than simply blocking; see "Core Events & Callbacks", below. + + + Thread control + ------ ------- + +These calls must be made on a quiescent thread (or the current thread): + + int utrace_inject_signal(struct task_struct *target, + struct utrace_attached_engine *engine, + u32 action, siginfo_t *info, + const struct k_sigaction *ka); + +Cause a specified signal delivery in the target thread. This is not +like kill, which generates a signal to be dequeued and delivered later. +Injection directs the thread to deliver a signal now, before it next +resumes in user mode or dequeues any other pending signal. It's as if +the tracing engine intercepted a signal event and its report_signal +callback returned the action argument as its value (see below). The +info and ka arguments serve the same purposes as their counterparts in +a report_signal callback. + + const struct utrace_regset * + utrace_regset(struct task_struct *target, + struct utrace_attached_engine *engine, + const struct utrace_regset_view *view, + int which); + +Get access to machine state for the thread. The struct utrace_regset_view +indicates a view of machine state, corresponding to a user mode +architecture personality (such as 32-bit or 64-bit versions of a machine). +The which argument selects one of the register sets available in that view. +The utrace_regset call must be made before accessing any machine state, +each time the thread has been running and has then become quiescent. +It ensures that the thread's state is ready to be accessed, and returns +the struct utrace_regset giving its accessor functions. + +XXX needs front ends for argument checks, export utrace_native_view + + + Core Events & Callbacks + ---- ------ - --------- + +Event reporting callbacks have details particular to the event type, but +are all called in similar environments and have the same constraints. +Callbacks are made from safe spots, where no locks are held, no special +resources are pinned, and the user-mode state of the thread is accessible. +So, callback code has a pretty free hand. But to be a good citizen, +callback code should never block for long periods. It is fine to block in +kmalloc and the like, but never wait for i/o or for user mode to do +something. If you need the thread to wait, set UTRACE_ACTION_QUIESCE and +return from the callback quickly. When your i/o finishes or whatever, you +can use utrace_set_flags to resume the thread. + +Well-behaved callbacks are important to maintain two essential properties +of the interface. The first of these is that unrelated tracing engines not +interfere with each other. If your engine's event callback does not return +quickly, then another engine won't get the event notification in a timely +manner. The second important property is that tracing be as noninvasive as +possible to the normal operation of the system overall and of the traced +thread in particular. That is, attached tracing engines should not perturb +a thread's behavior, except to the extent that changing its user-visible +state is explicitly what you want to do. (Obviously some perturbation is +unavoidable, primarily timing changes, ranging from small delays due to the +overhead of tracing, to arbitrary pauses in user code execution when a user +stops a thread with a debugger for examination. When doing asynchronous +utrace_attach to a thread doing a system call, more troublesome side +effects are possible.) Even when you explicitly want the pertrubation of +making the traced thread block, just blocking directly in your callback has +more unwanted effects. For example, the CLONE event callbacks are called +when the new child thread has been created but not yet started running; the +child can never be scheduled until the CLONE tracing callbacks return. +(This allows engines tracing the parent to attach to the child.) If a +CLONE event callback blocks the parent thread, it also prevents the child +thread from running (even to process a SIGKILL). If what you want is to +make both the parent and child block, then use utrace_attach on the child +and then set the QUIESCE action state flag on both threads. A more crucial +problem with blocking in callbacks is that it can prevent SIGKILL from +working. A thread that is blocking due to UTRACE_ACTION_QUIESCE will still +wake up and die immediately when sent a SIGKILL, as all threads should. +Relying on the utrace infrastructure rather than on private synchronization +calls in event callbacks is an important way to help keep tracing robustly +noninvasive. + + +EVENT(REAP) Dead thread has been reaped +Callback: + void (*report_reap)(struct utrace_attached_engine *engine, + struct task_struct *tsk); + +This means the parent called wait, or else this was a detached thread or +a process whose parent ignores SIGCHLD. This cannot happen while the +UTRACE_ACTION_NOREAP flag is set. This is the only callback you are +guaranteed to get (if you set the flag). + +Unlike other callbacks, this can be called from the parent's context +rather than from the traced thread itself--it must not delay the parent by +blocking. This callback is different from all others, it returns void. +Once you get this callback, your engine is automatically detached and you +cannot access this thread or use this struct utrace_attached_engine handle +any longer. This is the place to clean up your data structures and +synchronize with your code that might try to make utrace_* calls using this +engine data structure. The struct is still valid during this callback, +but will be freed soon after it returns (via RCU). + +In all other callbacks, the return value is as described above. +The common UTRACE_ACTION_* flags in the return value are always observed. +Unless otherwise specified below, other bits in the return value are ignored. + + +EVENT(QUIESCE) Thread is quiescent +Callback: + u32 (*report_quiesce)(struct utrace_attached_engine *engine, + struct task_struct *tsk); + +This is the least interesting callback. It happens at any safe spot, +including after any other event callback. This lets the tracing engine +know that it is safe to access the thread's state, or to report to users +that it has stopped running user code. + +EVENT(CLONE) Thread is creating a child +Callback: + u32 (*report_clone)(struct utrace_attached_engine *engine, + struct task_struct *parent, + unsigned long clone_flags, + struct task_struct *child); + +A clone/clone2/fork/vfork system call has succeeded in creating a new +thread or child process. The new process is fully formed, but not yet +running. During this callback, other tracing engines are prevented from +using utrace_attach asynchronously on the child, so that engines tracing +the parent get the first opportunity to attach. After this callback +returns, the child will start and the parent's system call will return. +If CLONE_VFORK is set, the parent will block before returning. + +EVENT(VFORK_DONE) Finished waiting for CLONE_VFORK child +Callback: + u32 (*report_vfork_done)(struct utrace_attached_engine *engine, + struct task_struct *parent, pid_t child_pid); + +Event reported for parent using CLONE_VFORK or vfork system call. +The child has died or exec'd, so the vfork parent has unblocked +and is about to return child_pid. + +UTRACE_EVENT(EXEC) Completed exec +Callback: + u32 (*report_exec)(struct utrace_attached_engine *engine, + struct task_struct *tsk, + const struct linux_binprm *bprm, + struct pt_regs *regs); + +An execve system call has succeeded and the new program is about to +start running. The initial user register state is handy to be tweaked +directly, or utrace_regset can be used for full machine state access. + +UTRACE_EVENT(EXIT) Thread is exiting +Callback: + u32 (*report_exit)(struct utrace_attached_engine *engine, + struct task_struct *tsk, + long orig_code, long *code); + +The thread is exiting and cannot be prevented from doing so, but all its +state is still live. The *code value will be the wait result seen by +the parent, and can be changed by this engine or others. The orig_code +value is the real status, not changed by any tracing engine. + +UTRACE_EVENT(DEATH) Thread has finished exiting +Callback: + u32 (*report_death)(struct utrace_attached_engine *engine, + struct task_struct *tsk); + +The thread is really dead now. If the UTRACE_ACTION_NOREAP flag is set +after this callback, it remains an unreported zombie. Otherwise, it might +be reaped by its parent, or self-reap immediately. Though the actual +reaping may happen in parallel, a report_reap callback will always be +ordered after a report_death callback. + +UTRACE_EVENT(SYSCALL_ENTRY) Thread has entered kernel for a system call +Callback: + u32 (*report_syscall_entry)(struct utrace_attached_engine *engine, + struct task_struct *tsk, + struct pt_regs *regs); + +The system call number and arguments can be seen and modified in the +registers. The return value register has -ENOSYS, which will be +returned for an invalid system call. The macro tracehook_abort_syscall(regs) +will abort the system call so that we go immediately to syscall exit, +and return -ENOSYS (or whatever the register state is changed to). If +tracing enginges keep the thread quiescent here, the system call will +not be performed until it resumes. + +UTRACE_EVENT(SYSCALL_EXIT) Thread is leaving kernel after a system call +Callback: + u32 (*report_syscall_exit)(struct utrace_attached_engine *engine, + struct task_struct *tsk, + struct pt_regs *regs); + +The return value can be seen and modified in the registers. If the +thread is allowed to resume, it will see any pending signals and then +return to user mode. + +UTRACE_EVENT(SIGNAL) Signal caught by user handler +UTRACE_EVENT(SIGNAL_IGN) Signal with no effect (SIG_IGN or default) +UTRACE_EVENT(SIGNAL_STOP) Job control stop signal +UTRACE_EVENT(SIGNAL_TERM) Fatal termination signal +UTRACE_EVENT(SIGNAL_CORE) Fatal core-dump signal +UTRACE_EVENT_SIGNAL_ALL All of the above (bitmask) +Callback: + u32 (*report_signal)(struct utrace_attached_engine *engine, + struct task_struct *tsk, + u32 action, siginfo_t *info, + const struct k_sigaction *orig_ka, + struct k_sigaction *return_ka); + +There are five types of signal events, but all use the same callback. +These happen when a thread is dequeuing a signal to be delivered. +(Not immediately when the signal is sent, and not when the signal is +blocked.) No signal event is reported for SIGKILL; no tracing engine +can prevent it from killing the thread immediately. The specific +event types allow an engine to trace signals based on what they do. +UTRACE_EVENT_SIGNAL_ALL is all of them OR'd together, to trace all +signals (except SIGKILL). A subset of these event flags can be used +e.g. to catch only fatal signals, not handled ones, or to catch only +core-dump signals, not normal termination signals. + +The action argument says what the signal's default disposition is: + + UTRACE_SIGNAL_DELIVER Run the user handler from sigaction. + UTRACE_SIGNAL_IGN Do nothing, ignore the signal. + UTRACE_SIGNAL_TERM Terminate the process. + UTRACE_SIGNAL_CORE Terminate the process a write a core dump. + UTRACE_SIGNAL_STOP Absolutely stop the process, a la SIGSTOP. + UTRACE_SIGNAL_TSTP Job control stop (no stop if orphaned). + +This selection is made from consulting the process's sigaction and the +default action for the signal number, but may already have been +changed by an earlier tracing engine (in which case you see its override). +A return value of UTRACE_ACTION_RESUME means to carry out this action. +If instead UTRACE_SIGNAL_* bits are in the return value, that overrides +the normal behavior of the signal. + +The signal number and other details of the signal are in info, and +this data can be changed to make the thread see a different signal. +A return value of UTRACE_SIGNAL_DELIVER says to follow the sigaction in +return_ka, which can specify a user handler or SIG_IGN to ignore the +signal or SIG_DFL to follow the default action for info->si_signo. +The orig_ka parameter shows the process's sigaction at the time the +signal was dequeued, and return_ka initially contains this. Tracing +engines can modify return_ka to change the effects of delivery. +For other UTRACE_SIGNAL_* return values, return_ka is ignored. + +UTRACE_SIGNAL_HOLD is a flag bit that can be OR'd into the return +value. It says to push the signal back on the thread's queue, with +the signal number and details possibly changed in info. When the +thread is allowed to resume, it will dequeue and report it again.