1 diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
2 index 9632444..bf4b9e8 100644
3 --- a/Documentation/DocBook/Makefile
4 +++ b/Documentation/DocBook/Makefile
6 DOCBOOKS := z8530book.xml mcabook.xml device-drivers.xml \
7 kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
8 procfs-guide.xml writing_usb_driver.xml networking.xml \
9 - kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml \
10 + kernel-api.xml filesystems.xml lsm.xml usb.xml kgdb.xml utrace.xml \
11 gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
12 genericirq.xml s390-drivers.xml uio-howto.xml scsi.xml \
13 mac80211.xml debugobjects.xml sh.xml regulator.xml \
14 diff --git a/Documentation/DocBook/utrace.tmpl b/Documentation/DocBook/utrace.tmpl
16 index 0000000..6cc58a1
18 +++ b/Documentation/DocBook/utrace.tmpl
20 +<?xml version="1.0" encoding="UTF-8"?>
21 +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
22 +"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
26 + <title>The utrace User Debugging Infrastructure</title>
31 + <chapter id="concepts"><title>utrace concepts</title>
33 + <sect1 id="intro"><title>Introduction</title>
36 + <application>utrace</application> is infrastructure code for tracing
37 + and controlling user threads. This is the foundation for writing
38 + tracing engines, which can be loadable kernel modules.
42 + The basic actors in <application>utrace</application> are the thread
43 + and the tracing engine. A tracing engine is some body of code that
44 + calls into the <filename><linux/utrace.h></filename>
45 + interfaces, represented by a <structname>struct
46 + utrace_engine_ops</structname>. (Usually it's a kernel module,
47 + though the legacy <function>ptrace</function> support is a tracing
48 + engine that is not in a kernel module.) The interface operates on
49 + individual threads (<structname>struct task_struct</structname>).
50 + If an engine wants to treat several threads as a group, that is up
51 + to its higher-level code.
55 + Tracing begins by attaching an engine to a thread, using
56 + <function>utrace_attach_task</function> or
57 + <function>utrace_attach_pid</function>. If successful, it returns a
58 + pointer that is the handle used in all other calls.
63 + <sect1 id="callbacks"><title>Events and Callbacks</title>
66 + An attached engine does nothing by default. An engine makes something
67 + happen by requesting callbacks via <function>utrace_set_events</function>
68 + and poking the thread with <function>utrace_control</function>.
69 + The synchronization issues related to these two calls
70 + are discussed further below in <xref linkend="teardown"/>.
74 + Events are specified using the macro
75 + <constant>UTRACE_EVENT(<replaceable>type</replaceable>)</constant>.
76 + Each event type is associated with a callback in <structname>struct
77 + utrace_engine_ops</structname>. A tracing engine can leave unused
78 + callbacks <constant>NULL</constant>. The only callbacks required
79 + are those used by the event flags it sets.
83 + Many engines can be attached to each thread. When a thread has an
84 + event, each engine gets a callback if it has set the event flag for
85 + that event type. For most events, engines are called in the order they
86 + attached. Engines that attach after the event has occurred do not get
87 + callbacks for that event. This includes any new engines just attached
88 + by an existing engine's callback function. Once the sequence of
89 + callbacks for that one event has completed, such new engines are then
90 + eligible in the next sequence that starts when there is another event.
94 + Event reporting callbacks have details particular to the event type,
95 + but are all called in similar environments and have the same
96 + constraints. Callbacks are made from safe points, where no locks
97 + are held, no special resources are pinned (usually), and the
98 + user-mode state of the thread is accessible. So, callback code has
99 + a pretty free hand. But to be a good citizen, callback code should
100 + never block for long periods. It is fine to block in
101 + <function>kmalloc</function> and the like, but never wait for i/o or
102 + for user mode to do something. If you need the thread to wait, use
103 + <constant>UTRACE_STOP</constant> and return from the callback
104 + quickly. When your i/o finishes or whatever, you can use
105 + <function>utrace_control</function> to resume the thread.
109 + The <constant>UTRACE_EVENT(SYSCALL_ENTRY)</constant> event is a special
110 + case. While other events happen in the kernel when it will return to
111 + user mode soon, this event happens when entering the kernel before it
112 + will proceed with the work requested from user mode. Because of this
113 + difference, the <function>report_syscall_entry</function> callback is
114 + special in two ways. For this event, engines are called in reverse of
115 + the normal order (this includes the <function>report_quiesce</function>
116 + call that precedes a <function>report_syscall_entry</function> call).
117 + This preserves the semantics that the last engine to attach is called
118 + "closest to user mode"--the engine that is first to see a thread's user
119 + state when it enters the kernel is also the last to see that state when
120 + the thread returns to user mode. For the same reason, if these
121 + callbacks use <constant>UTRACE_STOP</constant> (see the next section),
122 + the thread stops immediately after callbacks rather than only when it's
123 + ready to return to user mode; when allowed to resume, it will actually
124 + attempt the system call indicated by the register values at that time.
129 + <sect1 id="safely"><title>Stopping Safely</title>
131 + <sect2 id="well-behaved"><title>Writing well-behaved callbacks</title>
134 + Well-behaved callbacks are important to maintain two essential
135 + properties of the interface. The first of these is that unrelated
136 + tracing engines should not interfere with each other. If your engine's
137 + event callback does not return quickly, then another engine won't get
138 + the event notification in a timely manner. The second important
139 + property is that tracing should be as noninvasive as possible to the
140 + normal operation of the system overall and of the traced thread in
141 + particular. That is, attached tracing engines should not perturb a
142 + thread's behavior, except to the extent that changing its user-visible
143 + state is explicitly what you want to do. (Obviously some perturbation
144 + is unavoidable, primarily timing changes, ranging from small delays due
145 + to the overhead of tracing, to arbitrary pauses in user code execution
146 + when a user stops a thread with a debugger for examination.) Even when
147 + you explicitly want the perturbation of making the traced thread block,
148 + just blocking directly in your callback has more unwanted effects. For
149 + example, the <constant>CLONE</constant> event callbacks are called when
150 + the new child thread has been created but not yet started running; the
151 + child can never be scheduled until the <constant>CLONE</constant>
152 + tracing callbacks return. (This allows engines tracing the parent to
153 + attach to the child.) If a <constant>CLONE</constant> event callback
154 + blocks the parent thread, it also prevents the child thread from
155 + running (even to process a <constant>SIGKILL</constant>). If what you
156 + want is to make both the parent and child block, then use
157 + <function>utrace_attach_task</function> on the child and then use
158 + <constant>UTRACE_STOP</constant> on both threads. A more crucial
159 + problem with blocking in callbacks is that it can prevent
160 + <constant>SIGKILL</constant> from working. A thread that is blocking
161 + due to <constant>UTRACE_STOP</constant> will still wake up and die
162 + immediately when sent a <constant>SIGKILL</constant>, as all threads
163 + should. Relying on the <application>utrace</application>
164 + infrastructure rather than on private synchronization calls in event
165 + callbacks is an important way to help keep tracing robustly
171 + <sect2 id="UTRACE_STOP"><title>Using <constant>UTRACE_STOP</constant></title>
174 + To control another thread and access its state, it must be stopped
175 + with <constant>UTRACE_STOP</constant>. This means that it is
176 + stopped and won't start running again while we access it. When a
177 + thread is not already stopped, <function>utrace_control</function>
178 + returns <constant>-EINPROGRESS</constant> and an engine must wait
179 + for an event callback when the thread is ready to stop. The thread
180 + may be running on another CPU or may be blocked. When it is ready
181 + to be examined, it will make callbacks to engines that set the
182 + <constant>UTRACE_EVENT(QUIESCE)</constant> event bit. To wake up an
183 + interruptible wait, use <constant>UTRACE_INTERRUPT</constant>.
187 + As long as some engine has used <constant>UTRACE_STOP</constant> and
188 + not called <function>utrace_control</function> to resume the thread,
189 + then the thread will remain stopped. <constant>SIGKILL</constant>
190 + will wake it up, but it will not run user code. When the stop is
191 + cleared with <function>utrace_control</function> or a callback
192 + return value, the thread starts running again.
193 + (See also <xref linkend="teardown"/>.)
200 + <sect1 id="teardown"><title>Tear-down Races</title>
202 + <sect2 id="SIGKILL"><title>Primacy of <constant>SIGKILL</constant></title>
204 + Ordinarily synchronization issues for tracing engines are kept fairly
205 + straightforward by using <constant>UTRACE_STOP</constant>. You ask a
206 + thread to stop, and then once it makes the
207 + <function>report_quiesce</function> callback it cannot do anything else
208 + that would result in another callback, until you let it with a
209 + <function>utrace_control</function> call. This simple arrangement
210 + avoids complex and error-prone code in each one of a tracing engine's
211 + event callbacks to keep them serialized with the engine's other
212 + operations done on that thread from another thread of control.
213 + However, giving tracing engines complete power to keep a traced thread
214 + stuck in place runs afoul of a more important kind of simplicity that
215 + the kernel overall guarantees: nothing can prevent or delay
216 + <constant>SIGKILL</constant> from making a thread die and release its
217 + resources. To preserve this important property of
218 + <constant>SIGKILL</constant>, it as a special case can break
219 + <constant>UTRACE_STOP</constant> like nothing else normally can. This
220 + includes both explicit <constant>SIGKILL</constant> signals and the
221 + implicit <constant>SIGKILL</constant> sent to each other thread in the
222 + same thread group by a thread doing an exec, or processing a fatal
223 + signal, or making an <function>exit_group</function> system call. A
224 + tracing engine can prevent a thread from beginning the exit or exec or
225 + dying by signal (other than <constant>SIGKILL</constant>) if it is
226 + attached to that thread, but once the operation begins, no tracing
227 + engine can prevent or delay all other threads in the same thread group
232 + <sect2 id="reap"><title>Final callbacks</title>
234 + The <function>report_reap</function> callback is always the final event
235 + in the life cycle of a traced thread. Tracing engines can use this as
236 + the trigger to clean up their own data structures. The
237 + <function>report_death</function> callback is always the penultimate
238 + event a tracing engine might see; it's seen unless the thread was
239 + already in the midst of dying when the engine attached. Many tracing
240 + engines will have no interest in when a parent reaps a dead process,
241 + and nothing they want to do with a zombie thread once it dies; for
242 + them, the <function>report_death</function> callback is the natural
243 + place to clean up data structures and detach. To facilitate writing
244 + such engines robustly, given the asynchrony of
245 + <constant>SIGKILL</constant>, and without error-prone manual
246 + implementation of synchronization schemes, the
247 + <application>utrace</application> infrastructure provides some special
248 + guarantees about the <function>report_death</function> and
249 + <function>report_reap</function> callbacks. It still takes some care
250 + to be sure your tracing engine is robust to tear-down races, but these
251 + rules make it reasonably straightforward and concise to handle a lot of
252 + corner cases correctly.
256 + <sect2 id="refcount"><title>Engine and task pointers</title>
258 + The first sort of guarantee concerns the core data structures
259 + themselves. <structname>struct utrace_engine</structname> is
260 + a reference-counted data structure. While you hold a reference, an
261 + engine pointer will always stay valid so that you can safely pass it to
262 + any <application>utrace</application> call. Each call to
263 + <function>utrace_attach_task</function> or
264 + <function>utrace_attach_pid</function> returns an engine pointer with a
265 + reference belonging to the caller. You own that reference until you
266 + drop it using <function>utrace_engine_put</function>. There is an
267 + implicit reference on the engine while it is attached. So if you drop
268 + your only reference, and then use
269 + <function>utrace_attach_task</function> without
270 + <constant>UTRACE_ATTACH_CREATE</constant> to look up that same engine,
271 + you will get the same pointer with a new reference to replace the one
272 + you dropped, just like calling <function>utrace_engine_get</function>.
273 + When an engine has been detached, either explicitly with
274 + <constant>UTRACE_DETACH</constant> or implicitly after
275 + <function>report_reap</function>, then any references you hold are all
276 + that keep the old engine pointer alive.
280 + There is nothing a kernel module can do to keep a <structname>struct
281 + task_struct</structname> alive outside of
282 + <function>rcu_read_lock</function>. When the task dies and is reaped
283 + by its parent (or itself), that structure can be freed so that any
284 + dangling pointers you have stored become invalid.
285 + <application>utrace</application> will not prevent this, but it can
286 + help you detect it safely. By definition, a task that has been reaped
287 + has had all its engines detached. All
288 + <application>utrace</application> calls can be safely called on a
289 + detached engine if the caller holds a reference on that engine pointer,
290 + even if the task pointer passed in the call is invalid. All calls
291 + return <constant>-ESRCH</constant> for a detached engine, which tells
292 + you that the task pointer you passed could be invalid now. Since
293 + <function>utrace_control</function> and
294 + <function>utrace_set_events</function> do not block, you can call those
295 + inside a <function>rcu_read_lock</function> section and be sure after
296 + they don't return <constant>-ESRCH</constant> that the task pointer is
297 + still valid until <function>rcu_read_unlock</function>. The
298 + infrastructure never holds task references of its own. Though neither
299 + <function>rcu_read_lock</function> nor any other lock is held while
300 + making a callback, it's always guaranteed that the <structname>struct
301 + task_struct</structname> and the <structname>struct
302 + utrace_engine</structname> passed as arguments remain valid
303 + until the callback function returns.
307 + The common means for safely holding task pointers that is available to
308 + kernel modules is to use <structname>struct pid</structname>, which
309 + permits <function>put_pid</function> from kernel modules. When using
310 + that, the calls <function>utrace_attach_pid</function>,
311 + <function>utrace_control_pid</function>,
312 + <function>utrace_set_events_pid</function>, and
313 + <function>utrace_barrier_pid</function> are available.
317 + <sect2 id="reap-after-death">
319 + Serialization of <constant>DEATH</constant> and <constant>REAP</constant>
322 + The second guarantee is the serialization of
323 + <constant>DEATH</constant> and <constant>REAP</constant> event
324 + callbacks for a given thread. The actual reaping by the parent
325 + (<function>release_task</function> call) can occur simultaneously
326 + while the thread is still doing the final steps of dying, including
327 + the <function>report_death</function> callback. If a tracing engine
328 + has requested both <constant>DEATH</constant> and
329 + <constant>REAP</constant> event reports, it's guaranteed that the
330 + <function>report_reap</function> callback will not be made until
331 + after the <function>report_death</function> callback has returned.
332 + If the <function>report_death</function> callback itself detaches
333 + from the thread, then the <function>report_reap</function> callback
334 + will never be made. Thus it is safe for a
335 + <function>report_death</function> callback to clean up data
336 + structures and detach.
340 + <sect2 id="interlock"><title>Interlock with final callbacks</title>
342 + The final sort of guarantee is that a tracing engine will know for sure
343 + whether or not the <function>report_death</function> and/or
344 + <function>report_reap</function> callbacks will be made for a certain
345 + thread. These tear-down races are disambiguated by the error return
346 + values of <function>utrace_set_events</function> and
347 + <function>utrace_control</function>. Normally
348 + <function>utrace_control</function> called with
349 + <constant>UTRACE_DETACH</constant> returns zero, and this means that no
350 + more callbacks will be made. If the thread is in the midst of dying,
351 + it returns <constant>-EALREADY</constant> to indicate that the
352 + <constant>report_death</constant> callback may already be in progress;
353 + when you get this error, you know that any cleanup your
354 + <function>report_death</function> callback does is about to happen or
355 + has just happened--note that if the <function>report_death</function>
356 + callback does not detach, the engine remains attached until the thread
357 + gets reaped. If the thread is in the midst of being reaped,
358 + <function>utrace_control</function> returns <constant>-ESRCH</constant>
359 + to indicate that the <function>report_reap</function> callback may
360 + already be in progress; this means the engine is implicitly detached
361 + when the callback completes. This makes it possible for a tracing
362 + engine that has decided asynchronously to detach from a thread to
363 + safely clean up its data structures, knowing that no
364 + <function>report_death</function> or <function>report_reap</function>
365 + callback will try to do the same. <constant>utrace_detach</constant>
366 + returns <constant>-ESRCH</constant> when the <structname>struct
367 + utrace_engine</structname> has already been detached, but is
368 + still a valid pointer because of its reference count. A tracing engine
369 + can use this to safely synchronize its own independent multiple threads
370 + of control with each other and with its event callbacks that detach.
374 + In the same vein, <function>utrace_set_events</function> normally
375 + returns zero; if the target thread was stopped before the call, then
376 + after a successful call, no event callbacks not requested in the new
377 + flags will be made. It fails with <constant>-EALREADY</constant> if
378 + you try to clear <constant>UTRACE_EVENT(DEATH)</constant> when the
379 + <function>report_death</function> callback may already have begun, if
380 + you try to clear <constant>UTRACE_EVENT(REAP)</constant> when the
381 + <function>report_reap</function> callback may already have begun, or if
382 + you try to newly set <constant>UTRACE_EVENT(DEATH)</constant> or
383 + <constant>UTRACE_EVENT(QUIESCE)</constant> when the target is already
384 + dead or dying. Like <function>utrace_control</function>, it returns
385 + <constant>-ESRCH</constant> when the thread has already been detached
386 + (including forcible detach on reaping). This lets the tracing engine
387 + know for sure which event callbacks it will or won't see after
388 + <function>utrace_set_events</function> has returned. By checking for
389 + errors, it can know whether to clean up its data structures immediately
390 + or to let its callbacks do the work.
394 + <sect2 id="barrier"><title>Using <function>utrace_barrier</function></title>
396 + When a thread is safely stopped, calling
397 + <function>utrace_control</function> with <constant>UTRACE_DETACH</constant>
398 + or calling <function>utrace_set_events</function> to disable some events
399 + ensures synchronously that your engine won't get any more of the callbacks
400 + that have been disabled (none at all when detaching). But these can also
401 + be used while the thread is not stopped, when it might be simultaneously
402 + making a callback to your engine. For this situation, these calls return
403 + <constant>-EINPROGRESS</constant> when it's possible a callback is in
404 + progress. If you are not prepared to have your old callbacks still run,
405 + then you can synchronize to be sure all the old callbacks are finished,
406 + using <function>utrace_barrier</function>. This is necessary if the
407 + kernel module containing your callback code is going to be unloaded.
410 + After using <constant>UTRACE_DETACH</constant> once, further calls to
411 + <function>utrace_control</function> with the same engine pointer will
412 + return <constant>-ESRCH</constant>. In contrast, after getting
413 + <constant>-EINPROGRESS</constant> from
414 + <function>utrace_set_events</function>, you can call
415 + <function>utrace_set_events</function> again later and if it returns zero
416 + then know the old callbacks have finished.
419 + Unlike all other calls, <function>utrace_barrier</function> (and
420 + <function>utrace_barrier_pid</function>) will accept any engine pointer you
421 + hold a reference on, even if <constant>UTRACE_DETACH</constant> has already
422 + been used. After any <function>utrace_control</function> or
423 + <function>utrace_set_events</function> call (these do not block), you can
424 + call <function>utrace_barrier</function> to block until callbacks have
425 + finished. This returns <constant>-ESRCH</constant> only if the engine is
426 + completely detached (finished all callbacks). Otherwise it waits
427 + until the thread is definitely not in the midst of a callback to this
428 + engine and then returns zero, but can return
429 + <constant>-ERESTARTSYS</constant> if its wait is interrupted.
437 +<chapter id="core"><title>utrace core API</title>
440 + The utrace API is declared in <filename><linux/utrace.h></filename>.
443 +!Iinclude/linux/utrace.h
448 +<chapter id="machine"><title>Machine State</title>
451 + The <function>task_current_syscall</function> function can be used on any
452 + valid <structname>struct task_struct</structname> at any time, and does
453 + not even require that <function>utrace_attach_task</function> was used at all.
457 + The other ways to access the registers and other machine-dependent state of
458 + a task can only be used on a task that is at a known safe point. The safe
459 + points are all the places where <function>utrace_set_events</function> can
460 + request callbacks (except for the <constant>DEATH</constant> and
461 + <constant>REAP</constant> events). So at any event callback, it is safe to
462 + examine <varname>current</varname>.
466 + One task can examine another only after a callback in the target task that
467 + returns <constant>UTRACE_STOP</constant> so that task will not return to user
468 + mode after the safe point. This guarantees that the task will not resume
469 + until the same engine uses <function>utrace_control</function>, unless the
470 + task dies suddenly. To examine safely, one must use a pair of calls to
471 + <function>utrace_prepare_examine</function> and
472 + <function>utrace_finish_examine</function> surrounding the calls to
473 + <structname>struct user_regset</structname> functions or direct examination
474 + of task data structures. <function>utrace_prepare_examine</function> returns
475 + an error if the task is not properly stopped and not dead. After a
476 + successful examination, the paired <function>utrace_finish_examine</function>
477 + call returns an error if the task ever woke up during the examination. If
478 + so, any data gathered may be scrambled and should be discarded. This means
479 + there was a spurious wake-up (which should not happen), or a sudden death.
482 +<sect1 id="regset"><title><structname>struct user_regset</structname></title>
485 + The <structname>struct user_regset</structname> API
486 + is declared in <filename><linux/regset.h></filename>.
489 +!Finclude/linux/regset.h
493 +<sect1 id="task_current_syscall">
494 + <title><filename>System Call Information</filename></title>
497 + This function is declared in <filename><linux/ptrace.h></filename>.
504 +<sect1 id="syscall"><title><filename>System Call Tracing</filename></title>
507 + The arch API for system call information is declared in
508 + <filename><asm/syscall.h></filename>.
509 + Each of these calls can be used only at system call entry tracing,
510 + or can be used only at system call exit and the subsequent safe points
511 + before returning to user mode.
512 + At system call entry tracing means either during a
513 + <structfield>report_syscall_entry</structfield> callback,
514 + or any time after that callback has returned <constant>UTRACE_STOP</constant>.
517 +!Finclude/asm-generic/syscall.h
523 +<chapter id="internals"><title>Kernel Internals</title>
526 + This chapter covers the interface to the tracing infrastructure
527 + from the core of the kernel and the architecture-specific code.
528 + This is for maintainers of the kernel and arch code, and not relevant
529 + to using the tracing facilities described in preceding chapters.
532 +<sect1 id="tracehook"><title>Core Calls In</title>
535 + These calls are declared in <filename><linux/tracehook.h></filename>.
536 + The core kernel calls these functions at various important places.
539 +!Finclude/linux/tracehook.h
543 +<sect1 id="arch"><title>Architecture Calls Out</title>
546 + An arch that has done all these things sets
547 + <constant>CONFIG_HAVE_ARCH_TRACEHOOK</constant>.
548 + This is required to enable the <application>utrace</application> code.
551 +<sect2 id="arch-ptrace"><title><filename><asm/ptrace.h></filename></title>
554 + An arch defines these in <filename><asm/ptrace.h></filename>
555 + if it supports hardware single-step or block-step features.
558 +!Finclude/linux/ptrace.h arch_has_single_step arch_has_block_step
559 +!Finclude/linux/ptrace.h user_enable_single_step user_enable_block_step
560 +!Finclude/linux/ptrace.h user_disable_single_step
564 +<sect2 id="arch-syscall">
565 + <title><filename><asm/syscall.h></filename></title>
568 + An arch provides <filename><asm/syscall.h></filename> that
569 + defines these as inlines, or declares them as exported functions.
570 + These interfaces are described in <xref linkend="syscall"/>.
575 +<sect2 id="arch-tracehook">
576 + <title><filename><linux/tracehook.h></filename></title>
579 + An arch must define <constant>TIF_NOTIFY_RESUME</constant>
580 + and <constant>TIF_SYSCALL_TRACE</constant>
581 + in its <filename><asm/thread_info.h></filename>.
582 + The arch code must call the following functions, all declared
583 + in <filename><linux/tracehook.h></filename> and
584 + described in <xref linkend="tracehook"/>:
588 + <para><function>tracehook_notify_resume</function></para>
591 + <para><function>tracehook_report_syscall_entry</function></para>
594 + <para><function>tracehook_report_syscall_exit</function></para>
597 + <para><function>tracehook_signal_handler</function></para>
610 diff --git a/fs/proc/array.c b/fs/proc/array.c
611 index 725a650..e299a63 100644
612 --- a/fs/proc/array.c
613 +++ b/fs/proc/array.c
615 #include <linux/pid_namespace.h>
616 #include <linux/ptrace.h>
617 #include <linux/tracehook.h>
618 +#include <linux/utrace.h>
619 #include <linux/vs_context.h>
620 #include <linux/vs_network.h>
622 @@ -188,6 +189,8 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
623 cred->uid, cred->euid, cred->suid, cred->fsuid,
624 cred->gid, cred->egid, cred->sgid, cred->fsgid);
626 + task_utrace_proc_status(m, p);
630 fdt = files_fdtable(p->files);
631 diff --git a/include/linux/init_task.h b/include/linux/init_task.h
632 index 5368fbd..aecd24e 100644
633 --- a/include/linux/init_task.h
634 +++ b/include/linux/init_task.h
635 @@ -167,6 +167,7 @@ extern struct cred init_cred;
636 [PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID), \
638 .dirties = INIT_PROP_LOCAL_SINGLE(dirties), \
641 INIT_PERF_COUNTERS(tsk) \
642 INIT_TRACE_IRQFLAGS \
643 diff --git a/include/linux/sched.h b/include/linux/sched.h
644 index 4d07542..2060aa1 100644
645 --- a/include/linux/sched.h
646 +++ b/include/linux/sched.h
647 @@ -59,6 +59,7 @@ struct sched_param {
648 #include <linux/errno.h>
649 #include <linux/nodemask.h>
650 #include <linux/mm_types.h>
651 +#include <linux/utrace_struct.h>
653 #include <asm/system.h>
654 #include <asm/page.h>
655 @@ -1314,6 +1315,11 @@ struct task_struct {
659 +#ifdef CONFIG_UTRACE
660 + struct utrace utrace;
661 + unsigned long utrace_flags;
664 /* vserver context data */
665 struct vx_info *vx_info;
666 struct nx_info *nx_info;
667 diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
668 index 7c2bfd9..a91d9a4 100644
669 --- a/include/linux/tracehook.h
670 +++ b/include/linux/tracehook.h
672 #include <linux/sched.h>
673 #include <linux/ptrace.h>
674 #include <linux/security.h>
675 +#include <linux/utrace.h>
679 @@ -63,6 +64,8 @@ struct linux_binprm;
681 static inline int tracehook_expect_breakpoints(struct task_struct *task)
683 + if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_CORE)))
685 return (task_ptrace(task) & PT_PTRACED) != 0;
688 @@ -111,6 +114,9 @@ static inline void ptrace_report_syscall(struct pt_regs *regs)
689 static inline __must_check int tracehook_report_syscall_entry(
690 struct pt_regs *regs)
692 + if ((task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_ENTRY)) &&
693 + utrace_report_syscall_entry(regs))
695 ptrace_report_syscall(regs);
698 @@ -134,6 +140,8 @@ static inline __must_check int tracehook_report_syscall_entry(
700 static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step)
702 + if (task_utrace_flags(current) & UTRACE_EVENT(SYSCALL_EXIT))
703 + utrace_report_syscall_exit(regs);
704 ptrace_report_syscall(regs);
707 @@ -194,6 +202,8 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
708 struct linux_binprm *bprm,
709 struct pt_regs *regs)
711 + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXEC)))
712 + utrace_report_exec(fmt, bprm, regs);
713 if (!ptrace_event(PT_TRACE_EXEC, PTRACE_EVENT_EXEC, 0) &&
714 unlikely(task_ptrace(current) & PT_PTRACED))
715 send_sig(SIGTRAP, current, 0);
716 @@ -211,6 +221,8 @@ static inline void tracehook_report_exec(struct linux_binfmt *fmt,
718 static inline void tracehook_report_exit(long *exit_code)
720 + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(EXIT)))
721 + utrace_report_exit(exit_code);
722 ptrace_event(PT_TRACE_EXIT, PTRACE_EVENT_EXIT, *exit_code);
725 @@ -254,6 +266,7 @@ static inline int tracehook_prepare_clone(unsigned clone_flags)
726 static inline void tracehook_finish_clone(struct task_struct *child,
727 unsigned long clone_flags, int trace)
729 + utrace_init_task(child);
730 ptrace_init_task(child, (clone_flags & CLONE_PTRACE) || trace);
733 @@ -278,6 +291,8 @@ static inline void tracehook_report_clone(struct pt_regs *regs,
734 unsigned long clone_flags,
735 pid_t pid, struct task_struct *child)
737 + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE)))
738 + utrace_report_clone(clone_flags, child);
739 if (unlikely(task_ptrace(child))) {
741 * It doesn't matter who attached/attaching to this
742 @@ -310,6 +325,9 @@ static inline void tracehook_report_clone_complete(int trace,
744 struct task_struct *child)
746 + if (unlikely(task_utrace_flags(current) & UTRACE_EVENT(CLONE)) &&
747 + (clone_flags & CLONE_VFORK))
748 + utrace_finish_vfork(current);
750 ptrace_event(0, trace, pid);
752 @@ -344,6 +362,7 @@ static inline void tracehook_report_vfork_done(struct task_struct *child,
754 static inline void tracehook_prepare_release_task(struct task_struct *task)
756 + utrace_release_task(task);
760 @@ -358,6 +377,7 @@ static inline void tracehook_prepare_release_task(struct task_struct *task)
761 static inline void tracehook_finish_release_task(struct task_struct *task)
763 ptrace_release_task(task);
764 + BUG_ON(task->exit_state != EXIT_DEAD);
768 @@ -379,6 +399,8 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
769 const struct k_sigaction *ka,
770 struct pt_regs *regs, int stepping)
772 + if (task_utrace_flags(current))
773 + utrace_signal_handler(current, stepping);
775 ptrace_notify(SIGTRAP);
777 @@ -396,6 +418,8 @@ static inline void tracehook_signal_handler(int sig, siginfo_t *info,
778 static inline int tracehook_consider_ignored_signal(struct task_struct *task,
781 + if (unlikely(task_utrace_flags(task) & UTRACE_EVENT(SIGNAL_IGN)))
783 return (task_ptrace(task) & PT_PTRACED) != 0;
786 @@ -415,6 +439,9 @@ static inline int tracehook_consider_ignored_signal(struct task_struct *task,
787 static inline int tracehook_consider_fatal_signal(struct task_struct *task,
790 + if (unlikely(task_utrace_flags(task) & (UTRACE_EVENT(SIGNAL_TERM) |
791 + UTRACE_EVENT(SIGNAL_CORE))))
793 return (task_ptrace(task) & PT_PTRACED) != 0;
796 @@ -429,6 +456,8 @@ static inline int tracehook_consider_fatal_signal(struct task_struct *task,
798 static inline int tracehook_force_sigpending(void)
800 + if (unlikely(task_utrace_flags(current)))
801 + return utrace_interrupt_pending();
805 @@ -458,6 +487,8 @@ static inline int tracehook_get_signal(struct task_struct *task,
807 struct k_sigaction *return_ka)
809 + if (unlikely(task_utrace_flags(task)))
810 + return utrace_get_signal(task, regs, info, return_ka);
814 @@ -485,6 +516,8 @@ static inline int tracehook_get_signal(struct task_struct *task,
816 static inline int tracehook_notify_jctl(int notify, int why)
818 + if (task_utrace_flags(current) & UTRACE_EVENT(JCTL))
819 + utrace_report_jctl(notify, why);
820 return notify ?: (current->ptrace & PT_PTRACED) ? why : 0;
823 @@ -508,6 +541,8 @@ static inline int tracehook_notify_jctl(int notify, int why)
824 static inline int tracehook_notify_death(struct task_struct *task,
825 void **death_cookie, int group_dead)
827 + *death_cookie = task_utrace_struct(task);
829 if (task_detached(task))
830 return task->ptrace ? SIGCHLD : DEATH_REAP;
832 @@ -544,6 +579,20 @@ static inline void tracehook_report_death(struct task_struct *task,
833 int signal, void *death_cookie,
837 + * This barrier ensures that our caller's setting of
838 + * @task->exit_state precedes checking @task->utrace_flags here.
839 + * If utrace_set_events() was just called to enable
840 + * UTRACE_EVENT(DEATH), then we are obliged to call
841 + * utrace_report_death() and not miss it. utrace_set_events()
842 + * uses tasklist_lock to synchronize enabling the bit with the
843 + * actual change to @task->exit_state, but we need this barrier
844 + * to be sure we see a flags change made just before our caller
845 + * took the tasklist_lock.
848 + if (task_utrace_flags(task) & _UTRACE_DEATH_EVENTS)
849 + utrace_report_death(task, death_cookie, group_dead, signal);
852 #ifdef TIF_NOTIFY_RESUME
853 @@ -573,10 +622,20 @@ static inline void set_notify_resume(struct task_struct *task)
854 * asynchronously, this will be called again before we return to
857 - * Called without locks.
858 + * Called without locks. However, on some machines this may be
859 + * called with interrupts disabled.
861 static inline void tracehook_notify_resume(struct pt_regs *regs)
863 + struct task_struct *task = current;
865 + * This pairs with the barrier implicit in set_notify_resume().
866 + * It ensures that we read the nonzero utrace_flags set before
867 + * set_notify_resume() was called by utrace setup.
870 + if (task_utrace_flags(task))
871 + utrace_resume(task, regs);
873 #endif /* TIF_NOTIFY_RESUME */
875 diff --git a/include/linux/utrace.h b/include/linux/utrace.h
877 index 0000000..f877ec6
879 +++ b/include/linux/utrace.h
882 + * utrace infrastructure interface for debugging user processes
884 + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved.
886 + * This copyrighted material is made available to anyone wishing to use,
887 + * modify, copy, or redistribute it subject to the terms and conditions
888 + * of the GNU General Public License v.2.
890 + * Red Hat Author: Roland McGrath.
892 + * This interface allows for notification of interesting events in a
893 + * thread. It also mediates access to thread state such as registers.
894 + * Multiple unrelated users can be associated with a single thread.
895 + * We call each of these a tracing engine.
897 + * A tracing engine starts by calling utrace_attach_task() or
898 + * utrace_attach_pid() on the chosen thread, passing in a set of hooks
899 + * (&struct utrace_engine_ops), and some associated data. This produces a
900 + * &struct utrace_engine, which is the handle used for all other
901 + * operations. An attached engine has its ops vector, its data, and an
902 + * event mask controlled by utrace_set_events().
904 + * For each event bit that is set, that engine will get the
905 + * appropriate ops->report_*() callback when the event occurs. The
906 + * &struct utrace_engine_ops need not provide callbacks for an event
907 + * unless the engine sets one of the associated event bits.
910 +#ifndef _LINUX_UTRACE_H
911 +#define _LINUX_UTRACE_H 1
913 +#include <linux/list.h>
914 +#include <linux/kref.h>
915 +#include <linux/signal.h>
916 +#include <linux/sched.h>
918 +struct linux_binprm;
922 +struct user_regset_view;
925 + * Event bits passed to utrace_set_events().
926 + * These appear in &struct task_struct.@utrace_flags
927 + * and &struct utrace_engine.@flags.
929 +enum utrace_events {
930 + _UTRACE_EVENT_QUIESCE, /* Thread is available for examination. */
931 + _UTRACE_EVENT_REAP, /* Zombie reaped, no more tracing possible. */
932 + _UTRACE_EVENT_CLONE, /* Successful clone/fork/vfork just done. */
933 + _UTRACE_EVENT_EXEC, /* Successful execve just completed. */
934 + _UTRACE_EVENT_EXIT, /* Thread exit in progress. */
935 + _UTRACE_EVENT_DEATH, /* Thread has died. */
936 + _UTRACE_EVENT_SYSCALL_ENTRY, /* User entered kernel for system call. */
937 + _UTRACE_EVENT_SYSCALL_EXIT, /* Returning to user after system call. */
938 + _UTRACE_EVENT_SIGNAL, /* Signal delivery will run a user handler. */
939 + _UTRACE_EVENT_SIGNAL_IGN, /* No-op signal to be delivered. */
940 + _UTRACE_EVENT_SIGNAL_STOP, /* Signal delivery will suspend. */
941 + _UTRACE_EVENT_SIGNAL_TERM, /* Signal delivery will terminate. */
942 + _UTRACE_EVENT_SIGNAL_CORE, /* Signal delivery will dump core. */
943 + _UTRACE_EVENT_JCTL, /* Job control stop or continue completed. */
946 +#define UTRACE_EVENT(type) (1UL << _UTRACE_EVENT_##type)
949 + * All the kinds of signal events.
950 + * These all use the @report_signal() callback.
952 +#define UTRACE_EVENT_SIGNAL_ALL (UTRACE_EVENT(SIGNAL) \
953 + | UTRACE_EVENT(SIGNAL_IGN) \
954 + | UTRACE_EVENT(SIGNAL_STOP) \
955 + | UTRACE_EVENT(SIGNAL_TERM) \
956 + | UTRACE_EVENT(SIGNAL_CORE))
958 + * Both kinds of syscall events; these call the @report_syscall_entry()
959 + * and @report_syscall_exit() callbacks, respectively.
961 +#define UTRACE_EVENT_SYSCALL \
962 + (UTRACE_EVENT(SYSCALL_ENTRY) | UTRACE_EVENT(SYSCALL_EXIT))
965 + * The event reports triggered synchronously by task death.
967 +#define _UTRACE_DEATH_EVENTS (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE))
970 + * Hooks in <linux/tracehook.h> call these entry points to the
971 + * utrace dispatch. They are weak references here only so
972 + * tracehook.h doesn't need to #ifndef CONFIG_UTRACE them to
973 + * avoid external references in case of unoptimized compilation.
975 +bool utrace_interrupt_pending(void)
976 + __attribute__((weak));
977 +void utrace_resume(struct task_struct *, struct pt_regs *)
978 + __attribute__((weak));
979 +int utrace_get_signal(struct task_struct *, struct pt_regs *,
980 + siginfo_t *, struct k_sigaction *)
981 + __attribute__((weak));
982 +void utrace_report_clone(unsigned long, struct task_struct *)
983 + __attribute__((weak));
984 +void utrace_finish_vfork(struct task_struct *)
985 + __attribute__((weak));
986 +void utrace_report_exit(long *exit_code)
987 + __attribute__((weak));
988 +void utrace_report_death(struct task_struct *, struct utrace *, bool, int)
989 + __attribute__((weak));
990 +void utrace_report_jctl(int notify, int type)
991 + __attribute__((weak));
992 +void utrace_report_exec(struct linux_binfmt *, struct linux_binprm *,
993 + struct pt_regs *regs)
994 + __attribute__((weak));
995 +bool utrace_report_syscall_entry(struct pt_regs *)
996 + __attribute__((weak));
997 +void utrace_report_syscall_exit(struct pt_regs *)
998 + __attribute__((weak));
999 +void utrace_signal_handler(struct task_struct *, int)
1000 + __attribute__((weak));
1002 +#ifndef CONFIG_UTRACE
1005 + * <linux/tracehook.h> uses these accessors to avoid #ifdef CONFIG_UTRACE.
1007 +static inline unsigned long task_utrace_flags(struct task_struct *task)
1011 +static inline struct utrace *task_utrace_struct(struct task_struct *task)
1015 +static inline void utrace_init_task(struct task_struct *child)
1018 +static inline void utrace_release_task(struct task_struct *task)
1022 +static inline void task_utrace_proc_status(struct seq_file *m,
1023 + struct task_struct *p)
1027 +#else /* CONFIG_UTRACE */
1029 +static inline unsigned long task_utrace_flags(struct task_struct *task)
1031 + return task->utrace_flags;
1034 +static inline struct utrace *task_utrace_struct(struct task_struct *task)
1036 + return &task->utrace;
1039 +static inline void utrace_init_task(struct task_struct *task)
1041 + task->utrace_flags = 0;
1042 + memset(&task->utrace, 0, sizeof(task->utrace));
1043 + INIT_LIST_HEAD(&task->utrace.attached);
1044 + INIT_LIST_HEAD(&task->utrace.attaching);
1045 + spin_lock_init(&task->utrace.lock);
1048 +void utrace_release_task(struct task_struct *);
1049 +void task_utrace_proc_status(struct seq_file *m, struct task_struct *p);
1053 + * Version number of the API defined in this file. This will change
1054 + * whenever a tracing engine's code would need some updates to keep
1055 + * working. We maintain this here for the benefit of tracing engine code
1056 + * that is developed concurrently with utrace API improvements before they
1057 + * are merged into the kernel, making LINUX_VERSION_CODE checks unwieldy.
1059 +#define UTRACE_API_VERSION 20090416
1062 + * enum utrace_resume_action - engine's choice of action for a traced task
1063 + * @UTRACE_STOP: Stay quiescent after callbacks.
1064 + * @UTRACE_REPORT: Make some callback soon.
1065 + * @UTRACE_INTERRUPT: Make @report_signal() callback soon.
1066 + * @UTRACE_SINGLESTEP: Resume in user mode for one instruction.
1067 + * @UTRACE_BLOCKSTEP: Resume in user mode until next branch.
1068 + * @UTRACE_RESUME: Resume normally in user mode.
1069 + * @UTRACE_DETACH: Detach my engine (implies %UTRACE_RESUME).
1071 + * See utrace_control() for detailed descriptions of each action. This is
1072 + * encoded in the @action argument and the return value for every callback
1073 + * with a &u32 return value.
1075 + * The order of these is important. When there is more than one engine,
1076 + * each supplies its choice and the smallest value prevails.
1078 +enum utrace_resume_action {
1082 + UTRACE_SINGLESTEP,
1087 +#define UTRACE_RESUME_MASK 0x0f
1090 + * utrace_resume_action - &enum utrace_resume_action from callback action
1091 + * @action: &u32 callback @action argument or return value
1093 + * This extracts the &enum utrace_resume_action from @action,
1094 + * which is the @action argument to a &struct utrace_engine_ops
1095 + * callback or the return value from one.
1097 +static inline enum utrace_resume_action utrace_resume_action(u32 action)
1099 + return action & UTRACE_RESUME_MASK;
1103 + * enum utrace_signal_action - disposition of signal
1104 + * @UTRACE_SIGNAL_DELIVER: Deliver according to sigaction.
1105 + * @UTRACE_SIGNAL_IGN: Ignore the signal.
1106 + * @UTRACE_SIGNAL_TERM: Terminate the process.
1107 + * @UTRACE_SIGNAL_CORE: Terminate with core dump.
1108 + * @UTRACE_SIGNAL_STOP: Deliver as absolute stop.
1109 + * @UTRACE_SIGNAL_TSTP: Deliver as job control stop.
1110 + * @UTRACE_SIGNAL_REPORT: Reporting before pending signals.
1111 + * @UTRACE_SIGNAL_HANDLER: Reporting after signal handler setup.
1113 + * This is encoded in the @action argument and the return value for
1114 + * a @report_signal() callback. It says what will happen to the
1115 + * signal described by the &siginfo_t parameter to the callback.
1117 + * The %UTRACE_SIGNAL_REPORT value is used in an @action argument when
1118 + * a tracing report is being made before dequeuing any pending signal.
1119 + * If this is immediately after a signal handler has been set up, then
1120 + * %UTRACE_SIGNAL_HANDLER is used instead. A @report_signal callback
1121 + * that uses %UTRACE_SIGNAL_DELIVER|%UTRACE_SINGLESTEP will ensure
1122 + * it sees a %UTRACE_SIGNAL_HANDLER report.
1124 +enum utrace_signal_action {
1125 + UTRACE_SIGNAL_DELIVER = 0x00,
1126 + UTRACE_SIGNAL_IGN = 0x10,
1127 + UTRACE_SIGNAL_TERM = 0x20,
1128 + UTRACE_SIGNAL_CORE = 0x30,
1129 + UTRACE_SIGNAL_STOP = 0x40,
1130 + UTRACE_SIGNAL_TSTP = 0x50,
1131 + UTRACE_SIGNAL_REPORT = 0x60,
1132 + UTRACE_SIGNAL_HANDLER = 0x70
1134 +#define UTRACE_SIGNAL_MASK 0xf0
1135 +#define UTRACE_SIGNAL_HOLD 0x100 /* Flag, push signal back on queue. */
1138 + * utrace_signal_action - &enum utrace_signal_action from callback action
1139 + * @action: @report_signal callback @action argument or return value
1141 + * This extracts the &enum utrace_signal_action from @action, which
1142 + * is the @action argument to a @report_signal callback or the
1143 + * return value from one.
1145 +static inline enum utrace_signal_action utrace_signal_action(u32 action)
1147 + return action & UTRACE_SIGNAL_MASK;
1151 + * enum utrace_syscall_action - disposition of system call attempt
1152 + * @UTRACE_SYSCALL_RUN: Run the system call.
1153 + * @UTRACE_SYSCALL_ABORT: Don't run the system call.
1155 + * This is encoded in the @action argument and the return value for
1156 + * a @report_syscall_entry callback.
1158 +enum utrace_syscall_action {
1159 + UTRACE_SYSCALL_RUN = 0x00,
1160 + UTRACE_SYSCALL_ABORT = 0x10
1162 +#define UTRACE_SYSCALL_MASK 0xf0
1165 + * utrace_syscall_action - &enum utrace_syscall_action from callback action
1166 + * @action: @report_syscall_entry callback @action or return value
1168 + * This extracts the &enum utrace_syscall_action from @action, which
1169 + * is the @action argument to a @report_syscall_entry callback or the
1170 + * return value from one.
1172 +static inline enum utrace_syscall_action utrace_syscall_action(u32 action)
1174 + return action & UTRACE_SYSCALL_MASK;
1178 + * Flags for utrace_attach_task() and utrace_attach_pid().
1180 +#define UTRACE_ATTACH_CREATE 0x0010 /* Attach a new engine. */
1181 +#define UTRACE_ATTACH_EXCLUSIVE 0x0020 /* Refuse if existing match. */
1182 +#define UTRACE_ATTACH_MATCH_OPS 0x0001 /* Match engines on ops. */
1183 +#define UTRACE_ATTACH_MATCH_DATA 0x0002 /* Match engines on data. */
1184 +#define UTRACE_ATTACH_MATCH_MASK 0x000f
1187 + * struct utrace_engine - per-engine structure
1188 + * @ops: &struct utrace_engine_ops pointer passed to utrace_attach_task()
1189 + * @data: engine-private &void * passed to utrace_attach_task()
1190 + * @flags: event mask set by utrace_set_events() plus internal flag bits
1192 + * The task itself never has to worry about engines detaching while
1193 + * it's doing event callbacks. These structures are removed from the
1194 + * task's active list only when it's stopped, or by the task itself.
1196 + * utrace_engine_get() and utrace_engine_put() maintain a reference count.
1197 + * When it drops to zero, the structure is freed. One reference is held
1198 + * implicitly while the engine is attached to its task.
1200 +struct utrace_engine {
1203 + struct list_head entry;
1206 + const struct utrace_engine_ops *ops;
1209 + unsigned long flags;
1213 + * utrace_engine_get - acquire a reference on a &struct utrace_engine
1214 + * @engine: &struct utrace_engine pointer
1216 + * You must hold a reference on @engine, and you get another.
1218 +static inline void utrace_engine_get(struct utrace_engine *engine)
1220 + kref_get(&engine->kref);
1223 +void __utrace_engine_release(struct kref *);
1226 + * utrace_engine_put - release a reference on a &struct utrace_engine
1227 + * @engine: &struct utrace_engine pointer
1229 + * You must hold a reference on @engine, and you lose that reference.
1230 + * If it was the last one, @engine becomes an invalid pointer.
1232 +static inline void utrace_engine_put(struct utrace_engine *engine)
1234 + kref_put(&engine->kref, __utrace_engine_release);
1238 + * struct utrace_engine_ops - tracing engine callbacks
1240 + * Each @report_*() callback corresponds to an %UTRACE_EVENT(*) bit.
1241 + * utrace_set_events() calls on @engine choose which callbacks will be made
1242 + * to @engine from @task.
1244 + * Most callbacks take an @action argument, giving the resume action
1245 + * chosen by other tracing engines. All callbacks take an @engine
1246 + * argument, and a @task argument, which is always equal to @current.
1247 + * For some calls, @action also includes bits specific to that event
1248 + * and utrace_resume_action() is used to extract the resume action.
1249 + * This shows what would happen if @engine wasn't there, or will if
1250 + * the callback's return value uses %UTRACE_RESUME. This always
1251 + * starts as %UTRACE_RESUME when no other tracing is being done on
1254 + * All return values contain &enum utrace_resume_action bits. For
1255 + * some calls, other bits specific to that kind of event are added to
1256 + * the resume action bits with OR. These are the same bits used in
1257 + * the @action argument. The resume action returned by a callback
1258 + * does not override previous engines' choices, it only says what
1259 + * @engine wants done. What @task actually does is the action that's
1260 + * most constrained among the choices made by all attached engines.
1261 + * See utrace_control() for more information on the actions.
1263 + * When %UTRACE_STOP is used in @report_syscall_entry, then @task
1264 + * stops before attempting the system call. In other cases, the
1265 + * resume action does not take effect until @task is ready to check
1266 + * for signals and return to user mode. If there are more callbacks
1267 + * to be made, the last round of calls determines the final action.
1268 + * A @report_quiesce callback with @event zero, or a @report_signal
1269 + * callback, will always be the last one made before @task resumes.
1270 + * Only %UTRACE_STOP is "sticky"--if @engine returned %UTRACE_STOP
1271 + * then @task stays stopped unless @engine returns different from a
1272 + * following callback.
1274 + * The report_death() and report_reap() callbacks do not take @action
1275 + * arguments, and only %UTRACE_DETACH is meaningful in the return value
1276 + * from a report_death() callback. None of the resume actions applies
1277 + * to a dead thread.
1279 + * All @report_*() hooks are called with no locks held, in a generally
1280 + * safe environment when we will be returning to user mode soon (or just
1281 + * entered the kernel). It is fine to block for memory allocation and
1282 + * the like, but all hooks are asynchronous and must not block on
1283 + * external events! If you want the thread to block, use %UTRACE_STOP
1284 + * in your hook's return value; then later wake it up with utrace_control().
1286 + * @report_quiesce:
1287 + * Requested by %UTRACE_EVENT(%QUIESCE).
1288 + * This does not indicate any event, but just that @task (the current
1289 + * thread) is in a safe place for examination. This call is made
1290 + * before each specific event callback, except for @report_reap.
1291 + * The @event argument gives the %UTRACE_EVENT(@which) value for
1292 + * the event occurring. This callback might be made for events @engine
1293 + * has not requested, if some other engine is tracing the event;
1294 + * calling utrace_set_events() call here can request the immediate
1295 + * callback for this occurrence of @event. @event is zero when there
1296 + * is no other event, @task is now ready to check for signals and
1297 + * return to user mode, and some engine has used %UTRACE_REPORT or
1298 + * %UTRACE_INTERRUPT to request this callback. For this case,
1299 + * if @report_signal is not %NULL, the @report_quiesce callback
1300 + * may be replaced with a @report_signal callback passing
1301 + * %UTRACE_SIGNAL_REPORT in its @action argument, whenever @task is
1302 + * entering the signal-check path anyway.
1305 + * Requested by %UTRACE_EVENT(%SIGNAL_*) or %UTRACE_EVENT(%QUIESCE).
1306 + * Use utrace_signal_action() and utrace_resume_action() on @action.
1307 + * The signal action is %UTRACE_SIGNAL_REPORT when some engine has
1308 + * used %UTRACE_REPORT or %UTRACE_INTERRUPT; the callback can choose
1309 + * to stop or to deliver an artificial signal, before pending signals.
1310 + * It's %UTRACE_SIGNAL_HANDLER instead when signal handler setup just
1311 + * finished (after a previous %UTRACE_SIGNAL_DELIVER return); this
1312 + * serves in lieu of any %UTRACE_SIGNAL_REPORT callback requested by
1313 + * %UTRACE_REPORT or %UTRACE_INTERRUPT, and is also implicitly
1314 + * requested by %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP into the
1315 + * signal delivery. The other signal actions indicate a signal about
1316 + * to be delivered; the previous engine's return value sets the signal
1317 + * action seen by the the following engine's callback. The @info data
1318 + * can be changed at will, including @info->si_signo. The settings in
1319 + * @return_ka determines what %UTRACE_SIGNAL_DELIVER does. @orig_ka
1320 + * is what was in force before other tracing engines intervened, and
1321 + * it's %NULL when this report began as %UTRACE_SIGNAL_REPORT or
1322 + * %UTRACE_SIGNAL_HANDLER. For a report without a new signal, @info
1323 + * is left uninitialized and must be set completely by an engine that
1324 + * chooses to deliver a signal; if there was a previous @report_signal
1325 + * callback ending in %UTRACE_STOP and it was just resumed using
1326 + * %UTRACE_REPORT or %UTRACE_INTERRUPT, then @info is left unchanged
1327 + * from the previous callback. In this way, the original signal can
1328 + * be left in @info while returning %UTRACE_STOP|%UTRACE_SIGNAL_IGN
1329 + * and then found again when resuming @task with %UTRACE_INTERRUPT.
1330 + * The %UTRACE_SIGNAL_HOLD flag bit can be OR'd into the return value,
1331 + * and might be in @action if the previous engine returned it. This
1332 + * flag asks that the signal in @info be pushed back on @task's queue
1333 + * so that it will be seen again after whatever action is taken now.
1336 + * Requested by %UTRACE_EVENT(%CLONE).
1337 + * Event reported for parent, before the new task @child might run.
1338 + * @clone_flags gives the flags used in the clone system call,
1339 + * or equivalent flags for a fork() or vfork() system call.
1340 + * This function can use utrace_attach_task() on @child. It's guaranteed
1341 + * that asynchronous utrace_attach_task() calls will be ordered after
1342 + * any calls in @report_clone callbacks for the parent. Thus
1343 + * when using %UTRACE_ATTACH_EXCLUSIVE in the asynchronous calls,
1344 + * you can be sure that the parent's @report_clone callback has
1345 + * already attached to @child or chosen not to. Passing %UTRACE_STOP
1346 + * to utrace_control() on @child here keeps the child stopped before
1347 + * it ever runs in user mode, %UTRACE_REPORT or %UTRACE_INTERRUPT
1348 + * ensures a callback from @child before it starts in user mode.
1351 + * Requested by %UTRACE_EVENT(%JCTL).
1352 + * Job control event; @type is %CLD_STOPPED or %CLD_CONTINUED,
1353 + * indicating whether we are stopping or resuming now. If @notify
1354 + * is nonzero, @task is the last thread to stop and so will send
1355 + * %SIGCHLD to its parent after this callback; @notify reflects
1356 + * what the parent's %SIGCHLD has in @si_code, which can sometimes
1357 + * be %CLD_STOPPED even when @type is %CLD_CONTINUED.
1360 + * Requested by %UTRACE_EVENT(%EXEC).
1361 + * An execve system call has succeeded and the new program is about to
1362 + * start running. The initial user register state is handy to be tweaked
1363 + * directly in @regs. @fmt and @bprm gives the details of this exec.
1365 + * @report_syscall_entry:
1366 + * Requested by %UTRACE_EVENT(%SYSCALL_ENTRY).
1367 + * Thread has entered the kernel to request a system call.
1368 + * The user register state is handy to be tweaked directly in @regs.
1369 + * The @action argument contains an &enum utrace_syscall_action,
1370 + * use utrace_syscall_action() to extract it. The return value
1371 + * overrides the last engine's action for the system call.
1372 + * If the final action is %UTRACE_SYSCALL_ABORT, no system call
1373 + * is made. The details of the system call being attempted can
1374 + * be fetched here with syscall_get_nr() and syscall_get_arguments().
1375 + * The parameter registers can be changed with syscall_set_arguments().
1377 + * @report_syscall_exit:
1378 + * Requested by %UTRACE_EVENT(%SYSCALL_EXIT).
1379 + * Thread is about to leave the kernel after a system call request.
1380 + * The user register state is handy to be tweaked directly in @regs.
1381 + * The results of the system call attempt can be examined here using
1382 + * syscall_get_error() and syscall_get_return_value(). It is safe
1383 + * here to call syscall_set_return_value() or syscall_rollback().
1386 + * Requested by %UTRACE_EVENT(%EXIT).
1387 + * Thread is exiting and cannot be prevented from doing so,
1388 + * but all its state is still live. The @code value will be
1389 + * the wait result seen by the parent, and can be changed by
1390 + * this engine or others. The @orig_code value is the real
1391 + * status, not changed by any tracing engine. Returning %UTRACE_STOP
1392 + * here keeps @task stopped before it cleans up its state and dies,
1393 + * so it can be examined by other processes. When @task is allowed
1394 + * to run, it will die and get to the @report_death callback.
1397 + * Requested by %UTRACE_EVENT(%DEATH).
1398 + * Thread is really dead now. It might be reaped by its parent at
1399 + * any time, or self-reap immediately. Though the actual reaping
1400 + * may happen in parallel, a report_reap() callback will always be
1401 + * ordered after a report_death() callback.
1404 + * Requested by %UTRACE_EVENT(%REAP).
1405 + * Called when someone reaps the dead task (parent, init, or self).
1406 + * This means the parent called wait, or else this was a detached
1407 + * thread or a process whose parent ignores SIGCHLD.
1408 + * No more callbacks are made after this one.
1409 + * The engine is always detached.
1410 + * There is nothing more a tracing engine can do about this thread.
1411 + * After this callback, the @engine pointer will become invalid.
1412 + * The @task pointer may become invalid if get_task_struct() hasn't
1413 + * been used to keep it alive.
1414 + * An engine should always request this callback if it stores the
1415 + * @engine pointer or stores any pointer in @engine->data, so it
1416 + * can clean up its data structures.
1417 + * Unlike other callbacks, this can be called from the parent's context
1418 + * rather than from the traced thread itself--it must not delay the
1419 + * parent by blocking.
1421 +struct utrace_engine_ops {
1422 + u32 (*report_quiesce)(enum utrace_resume_action action,
1423 + struct utrace_engine *engine,
1424 + struct task_struct *task,
1425 + unsigned long event);
1426 + u32 (*report_signal)(u32 action,
1427 + struct utrace_engine *engine,
1428 + struct task_struct *task,
1429 + struct pt_regs *regs,
1431 + const struct k_sigaction *orig_ka,
1432 + struct k_sigaction *return_ka);
1433 + u32 (*report_clone)(enum utrace_resume_action action,
1434 + struct utrace_engine *engine,
1435 + struct task_struct *parent,
1436 + unsigned long clone_flags,
1437 + struct task_struct *child);
1438 + u32 (*report_jctl)(enum utrace_resume_action action,
1439 + struct utrace_engine *engine,
1440 + struct task_struct *task,
1441 + int type, int notify);
1442 + u32 (*report_exec)(enum utrace_resume_action action,
1443 + struct utrace_engine *engine,
1444 + struct task_struct *task,
1445 + const struct linux_binfmt *fmt,
1446 + const struct linux_binprm *bprm,
1447 + struct pt_regs *regs);
1448 + u32 (*report_syscall_entry)(u32 action,
1449 + struct utrace_engine *engine,
1450 + struct task_struct *task,
1451 + struct pt_regs *regs);
1452 + u32 (*report_syscall_exit)(enum utrace_resume_action action,
1453 + struct utrace_engine *engine,
1454 + struct task_struct *task,
1455 + struct pt_regs *regs);
1456 + u32 (*report_exit)(enum utrace_resume_action action,
1457 + struct utrace_engine *engine,
1458 + struct task_struct *task,
1459 + long orig_code, long *code);
1460 + u32 (*report_death)(struct utrace_engine *engine,
1461 + struct task_struct *task,
1462 + bool group_dead, int signal);
1463 + void (*report_reap)(struct utrace_engine *engine,
1464 + struct task_struct *task);
1468 + * struct utrace_examiner - private state for using utrace_prepare_examine()
1470 + * The members of &struct utrace_examiner are private to the implementation.
1471 + * This data type holds the state from a call to utrace_prepare_examine()
1472 + * to be used by a call to utrace_finish_examine().
1474 +struct utrace_examiner {
1477 + unsigned long ncsw;
1481 + * These are the exported entry points for tracing engines to use.
1482 + * See kernel/utrace.c for their kerneldoc comments with interface details.
1484 +struct utrace_engine *utrace_attach_task(struct task_struct *, int,
1485 + const struct utrace_engine_ops *,
1487 +struct utrace_engine *utrace_attach_pid(struct pid *, int,
1488 + const struct utrace_engine_ops *,
1490 +int __must_check utrace_control(struct task_struct *,
1491 + struct utrace_engine *,
1492 + enum utrace_resume_action);
1493 +int __must_check utrace_set_events(struct task_struct *,
1494 + struct utrace_engine *,
1495 + unsigned long eventmask);
1496 +int __must_check utrace_barrier(struct task_struct *,
1497 + struct utrace_engine *);
1498 +int __must_check utrace_prepare_examine(struct task_struct *,
1499 + struct utrace_engine *,
1500 + struct utrace_examiner *);
1501 +int __must_check utrace_finish_examine(struct task_struct *,
1502 + struct utrace_engine *,
1503 + struct utrace_examiner *);
1506 + * utrace_control_pid - control a thread being traced by a tracing engine
1507 + * @pid: thread to affect
1508 + * @engine: attached engine to affect
1509 + * @action: &enum utrace_resume_action for thread to do
1511 + * This is the same as utrace_control(), but takes a &struct pid
1512 + * pointer rather than a &struct task_struct pointer. The caller must
1513 + * hold a ref on @pid, but does not need to worry about the task
1514 + * staying valid. If it's been reaped so that @pid points nowhere,
1515 + * then this call returns -%ESRCH.
1517 +static inline __must_check int utrace_control_pid(
1518 + struct pid *pid, struct utrace_engine *engine,
1519 + enum utrace_resume_action action)
1522 + * We don't bother with rcu_read_lock() here to protect the
1523 + * task_struct pointer, because utrace_control will return
1524 + * -ESRCH without looking at that pointer if the engine is
1525 + * already detached. A task_struct pointer can't die before
1526 + * all the engines are detached in release_task() first.
1528 + struct task_struct *task = pid_task(pid, PIDTYPE_PID);
1529 + return unlikely(!task) ? -ESRCH : utrace_control(task, engine, action);
1533 + * utrace_set_events_pid - choose which event reports a tracing engine gets
1534 + * @pid: thread to affect
1535 + * @engine: attached engine to affect
1536 + * @eventmask: new event mask
1538 + * This is the same as utrace_set_events(), but takes a &struct pid
1539 + * pointer rather than a &struct task_struct pointer. The caller must
1540 + * hold a ref on @pid, but does not need to worry about the task
1541 + * staying valid. If it's been reaped so that @pid points nowhere,
1542 + * then this call returns -%ESRCH.
1544 +static inline __must_check int utrace_set_events_pid(
1545 + struct pid *pid, struct utrace_engine *engine, unsigned long eventmask)
1547 + struct task_struct *task = pid_task(pid, PIDTYPE_PID);
1548 + return unlikely(!task) ? -ESRCH :
1549 + utrace_set_events(task, engine, eventmask);
1553 + * utrace_barrier_pid - synchronize with simultaneous tracing callbacks
1554 + * @pid: thread to affect
1555 + * @engine: engine to affect (can be detached)
1557 + * This is the same as utrace_barrier(), but takes a &struct pid
1558 + * pointer rather than a &struct task_struct pointer. The caller must
1559 + * hold a ref on @pid, but does not need to worry about the task
1560 + * staying valid. If it's been reaped so that @pid points nowhere,
1561 + * then this call returns -%ESRCH.
1563 +static inline __must_check int utrace_barrier_pid(struct pid *pid,
1564 + struct utrace_engine *engine)
1566 + struct task_struct *task = pid_task(pid, PIDTYPE_PID);
1567 + return unlikely(!task) ? -ESRCH : utrace_barrier(task, engine);
1570 +#endif /* CONFIG_UTRACE */
1572 +#endif /* linux/utrace.h */
1573 diff --git a/include/linux/utrace_struct.h b/include/linux/utrace_struct.h
1574 new file mode 100644
1575 index 0000000..aba7e09
1577 +++ b/include/linux/utrace_struct.h
1580 + * 'struct utrace' data structure for kernel/utrace.c private use.
1582 + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved.
1584 + * This copyrighted material is made available to anyone wishing to use,
1585 + * modify, copy, or redistribute it subject to the terms and conditions
1586 + * of the GNU General Public License v.2.
1589 +#ifndef _LINUX_UTRACE_STRUCT_H
1590 +#define _LINUX_UTRACE_STRUCT_H 1
1592 +#ifdef CONFIG_UTRACE
1594 +#include <linux/list.h>
1595 +#include <linux/spinlock.h>
1598 + * Per-thread structure private to utrace implementation. This properly
1599 + * belongs in kernel/utrace.c and its use is entirely private to the code
1600 + * there. It is only defined in a header file so that it can be embedded
1601 + * in the struct task_struct layout. It is here rather than in utrace.h
1602 + * to avoid header nesting order issues getting too complex.
1606 + struct task_struct *cloning;
1608 + struct list_head attached, attaching;
1611 + struct utrace_engine *reporting;
1613 + unsigned int stopped:1;
1614 + unsigned int report:1;
1615 + unsigned int interrupt:1;
1616 + unsigned int signal_handler:1;
1617 + unsigned int vfork_stop:1; /* need utrace_stop() before vfork wait */
1618 + unsigned int death:1; /* in utrace_report_death() now */
1619 + unsigned int reap:1; /* release_task() has run */
1622 +# define INIT_UTRACE(tsk) \
1623 + .utrace_flags = 0, \
1625 + .lock = __SPIN_LOCK_UNLOCKED(tsk.utrace.lock), \
1626 + .attached = LIST_HEAD_INIT(tsk.utrace.attached), \
1627 + .attaching = LIST_HEAD_INIT(tsk.utrace.attaching), \
1632 +# define INIT_UTRACE(tsk) /* Nothing. */
1634 +#endif /* CONFIG_UTRACE */
1636 +#endif /* linux/utrace_struct.h */
1637 diff --git a/init/Kconfig b/init/Kconfig
1638 index 1ce05a4..f720929 100644
1641 @@ -1191,6 +1191,15 @@ config STOP_MACHINE
1643 Need stop_machine() primitive.
1646 + bool "Infrastructure for tracing and debugging user processes"
1647 + depends on EXPERIMENTAL
1648 + depends on HAVE_ARCH_TRACEHOOK
1650 + Enable the utrace process tracing interface. This is an internal
1651 + kernel interface exported to kernel modules, to track events in
1652 + user threads, extract and change user thread state.
1654 source "block/Kconfig"
1656 config PREEMPT_NOTIFIERS
1657 diff --git a/kernel/Makefile b/kernel/Makefile
1658 index 780c8dc..cd16d49 100644
1659 --- a/kernel/Makefile
1660 +++ b/kernel/Makefile
1661 @@ -69,6 +69,7 @@ obj-$(CONFIG_IKCONFIG) += configs.o
1662 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
1663 obj-$(CONFIG_STOP_MACHINE) += stop_machine.o
1664 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
1665 +obj-$(CONFIG_UTRACE) += utrace.o
1666 obj-$(CONFIG_AUDIT) += audit.o auditfilter.o audit_watch.o
1667 obj-$(CONFIG_AUDITSYSCALL) += auditsc.o
1668 obj-$(CONFIG_GCOV_KERNEL) += gcov/
1669 diff --git a/kernel/ptrace.c b/kernel/ptrace.c
1670 index 61c78b2..935eeee 100644
1671 --- a/kernel/ptrace.c
1672 +++ b/kernel/ptrace.c
1674 #include <linux/pagemap.h>
1675 #include <linux/smp_lock.h>
1676 #include <linux/ptrace.h>
1677 +#include <linux/utrace.h>
1678 #include <linux/security.h>
1679 #include <linux/signal.h>
1680 #include <linux/audit.h>
1681 @@ -164,6 +165,14 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
1686 + * For experimental use of utrace, exclude ptrace on the same task.
1688 +static inline bool exclude_ptrace(struct task_struct *task)
1690 + return unlikely(!!task_utrace_flags(task));
1693 int ptrace_attach(struct task_struct *task)
1696 @@ -186,6 +195,13 @@ int ptrace_attach(struct task_struct *task)
1701 + if (exclude_ptrace(task)) {
1703 + task_unlock(task);
1704 + goto unlock_creds;
1707 retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH);
1710 @@ -226,7 +242,9 @@ int ptrace_traceme(void)
1712 write_lock_irq(&tasklist_lock);
1713 /* Are we already being traced? */
1714 - if (!current->ptrace) {
1715 + if (exclude_ptrace(current)) {
1717 + } else if (!current->ptrace) {
1718 ret = security_ptrace_traceme(current->parent);
1720 * Check PF_EXITING to ensure ->real_parent has not passed
1721 @@ -577,7 +595,17 @@ int ptrace_request(struct task_struct *child, long request,
1725 -static struct task_struct *ptrace_get_task_struct(pid_t pid)
1727 + * ptrace_get_task_struct -- grab a task struct reference for ptrace
1728 + * @pid: process id to grab a task_struct reference of
1730 + * This function is a helper for ptrace implementations. It checks
1731 + * permissions and then grabs a task struct for use of the actual
1732 + * ptrace implementation.
1734 + * Returns the task_struct for @pid or an ERR_PTR() on failure.
1736 +struct task_struct *ptrace_get_task_struct(pid_t pid)
1738 struct task_struct *child;
1740 diff --git a/kernel/utrace.c b/kernel/utrace.c
1741 new file mode 100644
1742 index 0000000..74b5fc5
1744 +++ b/kernel/utrace.c
1747 + * utrace infrastructure interface for debugging user processes
1749 + * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved.
1751 + * This copyrighted material is made available to anyone wishing to use,
1752 + * modify, copy, or redistribute it subject to the terms and conditions
1753 + * of the GNU General Public License v.2.
1755 + * Red Hat Author: Roland McGrath.
1758 +#include <linux/utrace.h>
1759 +#include <linux/tracehook.h>
1760 +#include <linux/regset.h>
1761 +#include <asm/syscall.h>
1762 +#include <linux/ptrace.h>
1763 +#include <linux/err.h>
1764 +#include <linux/sched.h>
1765 +#include <linux/freezer.h>
1766 +#include <linux/module.h>
1767 +#include <linux/init.h>
1768 +#include <linux/slab.h>
1769 +#include <linux/seq_file.h>
1773 + * Rules for 'struct utrace', defined in <linux/utrace_struct.h>
1774 + * but used entirely privately in this file.
1776 + * The common event reporting loops are done by the task making the
1777 + * report without ever taking any locks. To facilitate this, the two
1778 + * lists @attached and @attaching work together for smooth asynchronous
1779 + * attaching with low overhead. Modifying either list requires @lock.
1780 + * The @attaching list can be modified any time while holding @lock.
1781 + * New engines being attached always go on this list.
1783 + * The @attached list is what the task itself uses for its reporting
1784 + * loops. When the task itself is not quiescent, it can use the
1785 + * @attached list without taking any lock. Nobody may modify the list
1786 + * when the task is not quiescent. When it is quiescent, that means
1787 + * that it won't run again without taking @lock itself before using
1790 + * At each place where we know the task is quiescent (or it's current),
1791 + * while holding @lock, we call splice_attaching(), below. This moves
1792 + * the @attaching list members on to the end of the @attached list.
1793 + * Since this happens at the start of any reporting pass, any new
1794 + * engines attached asynchronously go on the stable @attached list
1795 + * in time to have their callbacks seen.
1798 +static struct kmem_cache *utrace_engine_cachep;
1799 +static const struct utrace_engine_ops utrace_detached_ops; /* forward decl */
1801 +static int __init utrace_init(void)
1803 + utrace_engine_cachep = KMEM_CACHE(utrace_engine, SLAB_PANIC);
1806 +module_init(utrace_init);
1809 + * This is called with @utrace->lock held when the task is safely
1810 + * quiescent, i.e. it won't consult utrace->attached without the lock.
1811 + * Move any engines attached asynchronously from @utrace->attaching
1812 + * onto the @utrace->attached list.
1814 +static void splice_attaching(struct utrace *utrace)
1816 + list_splice_tail_init(&utrace->attaching, &utrace->attached);
1820 + * This is the exported function used by the utrace_engine_put() inline.
1822 +void __utrace_engine_release(struct kref *kref)
1824 + struct utrace_engine *engine = container_of(kref, struct utrace_engine,
1826 + BUG_ON(!list_empty(&engine->entry));
1827 + kmem_cache_free(utrace_engine_cachep, engine);
1829 +EXPORT_SYMBOL_GPL(__utrace_engine_release);
1831 +static bool engine_matches(struct utrace_engine *engine, int flags,
1832 + const struct utrace_engine_ops *ops, void *data)
1834 + if ((flags & UTRACE_ATTACH_MATCH_OPS) && engine->ops != ops)
1836 + if ((flags & UTRACE_ATTACH_MATCH_DATA) && engine->data != data)
1838 + return engine->ops && engine->ops != &utrace_detached_ops;
1841 +static struct utrace_engine *matching_engine(
1842 + struct utrace *utrace, int flags,
1843 + const struct utrace_engine_ops *ops, void *data)
1845 + struct utrace_engine *engine;
1846 + list_for_each_entry(engine, &utrace->attached, entry)
1847 + if (engine_matches(engine, flags, ops, data))
1849 + list_for_each_entry(engine, &utrace->attaching, entry)
1850 + if (engine_matches(engine, flags, ops, data))
1856 + * For experimental use, utrace attach is mutually exclusive with ptrace.
1858 +static inline bool exclude_utrace(struct task_struct *task)
1860 + return unlikely(!!task->ptrace);
1864 + * Called without locks, when we might be the first utrace engine to attach.
1865 + * If this is a newborn thread and we are not the creator, we have to wait
1866 + * for it. The creator gets the first chance to attach. The PF_STARTING
1867 + * flag is cleared after its report_clone hook has had a chance to run.
1869 +static inline int utrace_attach_delay(struct task_struct *target)
1871 + if ((target->flags & PF_STARTING) &&
1872 + current->utrace.cloning != target)
1874 + schedule_timeout_interruptible(1);
1875 + if (signal_pending(current))
1876 + return -ERESTARTNOINTR;
1877 + } while (target->flags & PF_STARTING);
1883 + * Enqueue @engine, or maybe don't if UTRACE_ATTACH_EXCLUSIVE.
1885 +static int utrace_add_engine(struct task_struct *target,
1886 + struct utrace *utrace,
1887 + struct utrace_engine *engine,
1889 + const struct utrace_engine_ops *ops,
1894 + spin_lock(&utrace->lock);
1896 + if (utrace->reap) {
1898 + * Already entered utrace_release_task(), cannot attach now.
1901 + } else if ((flags & UTRACE_ATTACH_EXCLUSIVE) &&
1902 + unlikely(matching_engine(utrace, flags, ops, data))) {
1906 + * Put the new engine on the pending ->attaching list.
1907 + * Make sure it gets onto the ->attached list by the next
1908 + * time it's examined.
1910 + * When target == current, it would be safe just to call
1911 + * splice_attaching() right here. But if we're inside a
1912 + * callback, that would mean the new engine also gets
1913 + * notified about the event that precipitated its own
1914 + * creation. This is not what the user wants.
1916 + * Setting ->report ensures that start_report() takes the
1917 + * lock and does it next time. Whenever setting ->report,
1918 + * we must maintain the invariant that TIF_NOTIFY_RESUME is
1919 + * also set. Otherwise utrace_control() or utrace_do_stop()
1920 + * might skip setting TIF_NOTIFY_RESUME upon seeing ->report
1921 + * already set, and we'd miss a necessary callback.
1923 + * In case we had no engines before, make sure that
1924 + * utrace_flags is not zero when tracehook_notify_resume()
1925 + * checks. That would bypass utrace reporting clearing
1926 + * TIF_NOTIFY_RESUME, and thus violate the same invariant.
1928 + target->utrace_flags |= UTRACE_EVENT(REAP);
1929 + list_add_tail(&engine->entry, &utrace->attaching);
1930 + utrace->report = 1;
1931 + set_notify_resume(target);
1936 + spin_unlock(&utrace->lock);
1942 + * utrace_attach_task - attach new engine, or look up an attached engine
1943 + * @target: thread to attach to
1944 + * @flags: flag bits combined with OR, see below
1945 + * @ops: callback table for new engine
1946 + * @data: engine private data pointer
1948 + * The caller must ensure that the @target thread does not get freed,
1949 + * i.e. hold a ref or be its parent. It is always safe to call this
1950 + * on @current, or on the @child pointer in a @report_clone callback.
1951 + * For most other cases, it's easier to use utrace_attach_pid() instead.
1953 + * UTRACE_ATTACH_CREATE:
1954 + * Create a new engine. If %UTRACE_ATTACH_CREATE is not specified, you
1955 + * only look up an existing engine already attached to the thread.
1957 + * UTRACE_ATTACH_EXCLUSIVE:
1958 + * Attempting to attach a second (matching) engine fails with -%EEXIST.
1960 + * UTRACE_ATTACH_MATCH_OPS: Only consider engines matching @ops.
1961 + * UTRACE_ATTACH_MATCH_DATA: Only consider engines matching @data.
1963 + * Calls with neither %UTRACE_ATTACH_MATCH_OPS nor %UTRACE_ATTACH_MATCH_DATA
1964 + * match the first among any engines attached to @target. That means that
1965 + * %UTRACE_ATTACH_EXCLUSIVE in such a call fails with -%EEXIST if there
1966 + * are any engines on @target at all.
1968 +struct utrace_engine *utrace_attach_task(
1969 + struct task_struct *target, int flags,
1970 + const struct utrace_engine_ops *ops, void *data)
1972 + struct utrace *utrace;
1973 + struct utrace_engine *engine;
1976 + utrace = &target->utrace;
1978 + if (unlikely(target->exit_state == EXIT_DEAD)) {
1980 + * The target has already been reaped.
1981 + * Check this early, though it's not synchronized.
1982 + * utrace_add_engine() will do the final check.
1984 + if (!(flags & UTRACE_ATTACH_CREATE))
1985 + return ERR_PTR(-ENOENT);
1986 + return ERR_PTR(-ESRCH);
1989 + if (!(flags & UTRACE_ATTACH_CREATE)) {
1990 + spin_lock(&utrace->lock);
1991 + engine = matching_engine(utrace, flags, ops, data);
1993 + utrace_engine_get(engine);
1994 + spin_unlock(&utrace->lock);
1995 + return engine ?: ERR_PTR(-ENOENT);
1998 + if (unlikely(!ops) || unlikely(ops == &utrace_detached_ops))
1999 + return ERR_PTR(-EINVAL);
2001 + if (unlikely(target->flags & PF_KTHREAD))
2003 + * Silly kernel, utrace is for users!
2005 + return ERR_PTR(-EPERM);
2007 + engine = kmem_cache_alloc(utrace_engine_cachep, GFP_KERNEL);
2008 + if (unlikely(!engine))
2009 + return ERR_PTR(-ENOMEM);
2012 + * Initialize the new engine structure. It starts out with two
2013 + * refs: one ref to return, and one ref for being attached.
2015 + kref_set(&engine->kref, 2);
2016 + engine->flags = 0;
2017 + engine->ops = ops;
2018 + engine->data = data;
2020 + ret = utrace_attach_delay(target);
2022 + ret = utrace_add_engine(target, utrace, engine,
2023 + flags, ops, data);
2025 + if (unlikely(ret)) {
2026 + kmem_cache_free(utrace_engine_cachep, engine);
2027 + engine = ERR_PTR(ret);
2032 +EXPORT_SYMBOL_GPL(utrace_attach_task);
2035 + * utrace_attach_pid - attach new engine, or look up an attached engine
2036 + * @pid: &struct pid pointer representing thread to attach to
2037 + * @flags: flag bits combined with OR, see utrace_attach_task()
2038 + * @ops: callback table for new engine
2039 + * @data: engine private data pointer
2041 + * This is the same as utrace_attach_task(), but takes a &struct pid
2042 + * pointer rather than a &struct task_struct pointer. The caller must
2043 + * hold a ref on @pid, but does not need to worry about the task
2044 + * staying valid. If it's been reaped so that @pid points nowhere,
2045 + * then this call returns -%ESRCH.
2047 +struct utrace_engine *utrace_attach_pid(
2048 + struct pid *pid, int flags,
2049 + const struct utrace_engine_ops *ops, void *data)
2051 + struct utrace_engine *engine = ERR_PTR(-ESRCH);
2052 + struct task_struct *task = get_pid_task(pid, PIDTYPE_PID);
2054 + engine = utrace_attach_task(task, flags, ops, data);
2055 + put_task_struct(task);
2059 +EXPORT_SYMBOL_GPL(utrace_attach_pid);
2062 + * When an engine is detached, the target thread may still see it and
2063 + * make callbacks until it quiesces. We install a special ops vector
2064 + * with these two callbacks. When the target thread quiesces, it can
2065 + * safely free the engine itself. For any event we will always get
2066 + * the report_quiesce() callback first, so we only need this one
2067 + * pointer to be set. The only exception is report_reap(), so we
2068 + * supply that callback too.
2070 +static u32 utrace_detached_quiesce(enum utrace_resume_action action,
2071 + struct utrace_engine *engine,
2072 + struct task_struct *task,
2073 + unsigned long event)
2075 + return UTRACE_DETACH;
2078 +static void utrace_detached_reap(struct utrace_engine *engine,
2079 + struct task_struct *task)
2083 +static const struct utrace_engine_ops utrace_detached_ops = {
2084 + .report_quiesce = &utrace_detached_quiesce,
2085 + .report_reap = &utrace_detached_reap
2089 + * After waking up from TASK_TRACED, clear bookkeeping in @utrace.
2090 + * Returns true if we were woken up prematurely by SIGKILL.
2092 +static inline bool finish_utrace_stop(struct task_struct *task,
2093 + struct utrace *utrace)
2095 + bool killed = false;
2098 + * utrace_wakeup() clears @utrace->stopped before waking us up.
2099 + * We're officially awake if it's clear.
2101 + spin_lock(&utrace->lock);
2102 + if (unlikely(utrace->stopped)) {
2104 + * If we're here with it still set, it must have been
2105 + * signal_wake_up() instead, waking us up for a SIGKILL.
2107 + spin_lock_irq(&task->sighand->siglock);
2108 + WARN_ON(!sigismember(&task->pending.signal, SIGKILL));
2109 + spin_unlock_irq(&task->sighand->siglock);
2110 + utrace->stopped = 0;
2113 + spin_unlock(&utrace->lock);
2119 + * Perform %UTRACE_STOP, i.e. block in TASK_TRACED until woken up.
2120 + * @task == current, @utrace == current->utrace, which is not locked.
2121 + * Return true if we were woken up by SIGKILL even though some utrace
2122 + * engine may still want us to stay stopped.
2124 +static bool utrace_stop(struct task_struct *task, struct utrace *utrace,
2130 + * @utrace->stopped is the flag that says we are safely
2131 + * inside this function. It should never be set on entry.
2133 + BUG_ON(utrace->stopped);
2136 + * The siglock protects us against signals. As well as SIGKILL
2137 + * waking us up, we must synchronize with the signal bookkeeping
2138 + * for stop signals and SIGCONT.
2140 + spin_lock(&utrace->lock);
2141 + spin_lock_irq(&task->sighand->siglock);
2143 + if (unlikely(sigismember(&task->pending.signal, SIGKILL))) {
2144 + spin_unlock_irq(&task->sighand->siglock);
2145 + spin_unlock(&utrace->lock);
2151 + * Ensure a reporting pass when we're resumed.
2153 + utrace->report = 1;
2154 + set_thread_flag(TIF_NOTIFY_RESUME);
2157 + utrace->stopped = 1;
2158 + __set_current_state(TASK_TRACED);
2161 + * If there is a group stop in progress,
2162 + * we must participate in the bookkeeping.
2164 + if (task->signal->group_stop_count > 0)
2165 + --task->signal->group_stop_count;
2167 + spin_unlock_irq(&task->sighand->siglock);
2168 + spin_unlock(&utrace->lock);
2173 + * While in TASK_TRACED, we were considered "frozen enough".
2174 + * Now that we woke up, it's crucial if we're supposed to be
2175 + * frozen that we freeze now before running anything substantial.
2179 + killed = finish_utrace_stop(task, utrace);
2182 + * While we were in TASK_TRACED, complete_signal() considered
2183 + * us "uninterested" in signal wakeups. Now make sure our
2184 + * TIF_SIGPENDING state is correct for normal running.
2186 + spin_lock_irq(&task->sighand->siglock);
2187 + recalc_sigpending();
2188 + spin_unlock_irq(&task->sighand->siglock);
2194 + * The caller has to hold a ref on the engine. If the attached flag is
2195 + * true (all but utrace_barrier() calls), the engine is supposed to be
2196 + * attached. If the attached flag is false (utrace_barrier() only),
2197 + * then return -ERESTARTSYS for an engine marked for detach but not yet
2198 + * fully detached. The task pointer can be invalid if the engine is
2201 + * Get the utrace lock for the target task.
2202 + * Returns the struct if locked, or ERR_PTR(-errno).
2204 + * This has to be robust against races with:
2205 + * utrace_control(target, UTRACE_DETACH) calls
2206 + * UTRACE_DETACH after reports
2207 + * utrace_report_death
2208 + * utrace_release_task
2210 +static struct utrace *get_utrace_lock(struct task_struct *target,
2211 + struct utrace_engine *engine,
2213 + __acquires(utrace->lock)
2215 + struct utrace *utrace;
2220 + * If this engine was already detached, bail out before we look at
2221 + * the task_struct pointer at all. If it's detached after this
2222 + * check, then RCU is still keeping this task_struct pointer valid.
2224 + * The ops pointer is NULL when the engine is fully detached.
2225 + * It's &utrace_detached_ops when it's marked detached but still
2226 + * on the list. In the latter case, utrace_barrier() still works,
2227 + * since the target might be in the middle of an old callback.
2229 + if (unlikely(!engine->ops)) {
2230 + rcu_read_unlock();
2231 + return ERR_PTR(-ESRCH);
2234 + if (unlikely(engine->ops == &utrace_detached_ops)) {
2235 + rcu_read_unlock();
2236 + return attached ? ERR_PTR(-ESRCH) : ERR_PTR(-ERESTARTSYS);
2239 + utrace = &target->utrace;
2240 + if (unlikely(target->exit_state == EXIT_DEAD)) {
2242 + * If all engines detached already, utrace is clear.
2243 + * Otherwise, we're called after utrace_release_task might
2244 + * have started. A call to this engine's report_reap
2245 + * callback might already be in progress.
2247 + utrace = ERR_PTR(-ESRCH);
2249 + spin_lock(&utrace->lock);
2250 + if (unlikely(!engine->ops) ||
2251 + unlikely(engine->ops == &utrace_detached_ops)) {
2253 + * By the time we got the utrace lock,
2254 + * it had been reaped or detached already.
2256 + spin_unlock(&utrace->lock);
2257 + utrace = ERR_PTR(-ESRCH);
2258 + if (!attached && engine->ops == &utrace_detached_ops)
2259 + utrace = ERR_PTR(-ERESTARTSYS);
2262 + rcu_read_unlock();
2268 + * Now that we don't hold any locks, run through any
2269 + * detached engines and free their references. Each
2270 + * engine had one implicit ref while it was attached.
2272 +static void put_detached_list(struct list_head *list)
2274 + struct utrace_engine *engine, *next;
2275 + list_for_each_entry_safe(engine, next, list, entry) {
2276 + list_del_init(&engine->entry);
2277 + utrace_engine_put(engine);
2282 + * Called with utrace->lock held.
2283 + * Notify and clean up all engines, then free utrace.
2285 +static void utrace_reap(struct task_struct *target, struct utrace *utrace)
2286 + __releases(utrace->lock)
2288 + struct utrace_engine *engine, *next;
2289 + const struct utrace_engine_ops *ops;
2290 + LIST_HEAD(detached);
2293 + splice_attaching(utrace);
2294 + list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
2295 + ops = engine->ops;
2296 + engine->ops = NULL;
2297 + list_move(&engine->entry, &detached);
2300 + * If it didn't need a callback, we don't need to drop
2301 + * the lock. Now nothing else refers to this engine.
2303 + if (!(engine->flags & UTRACE_EVENT(REAP)))
2307 + * This synchronizes with utrace_barrier(). Since we
2308 + * need the utrace->lock here anyway (unlike the other
2309 + * reporting loops), we don't need any memory barrier
2310 + * as utrace_barrier() holds the lock.
2312 + utrace->reporting = engine;
2313 + spin_unlock(&utrace->lock);
2315 + (*ops->report_reap)(engine, target);
2317 + utrace->reporting = NULL;
2319 + put_detached_list(&detached);
2321 + spin_lock(&utrace->lock);
2325 + spin_unlock(&utrace->lock);
2327 + put_detached_list(&detached);
2331 + * Called by release_task. After this, target->utrace must be cleared.
2333 +void utrace_release_task(struct task_struct *target)
2335 + struct utrace *utrace;
2337 + utrace = &target->utrace;
2339 + spin_lock(&utrace->lock);
2343 + if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS)) {
2344 + utrace_reap(target, utrace); /* Unlocks and frees. */
2349 + * The target will do some final callbacks but hasn't
2350 + * finished them yet. We know because it clears these
2351 + * event bits after it's done. Instead of cleaning up here
2352 + * and requiring utrace_report_death to cope with it, we
2353 + * delay the REAP report and the teardown until after the
2354 + * target finishes its death reports.
2357 + spin_unlock(&utrace->lock);
2361 + * We use an extra bit in utrace_engine.flags past the event bits,
2362 + * to record whether the engine is keeping the target thread stopped.
2364 +#define ENGINE_STOP (1UL << _UTRACE_NEVENTS)
2366 +static void mark_engine_wants_stop(struct utrace_engine *engine)
2368 + engine->flags |= ENGINE_STOP;
2371 +static void clear_engine_wants_stop(struct utrace_engine *engine)
2373 + engine->flags &= ~ENGINE_STOP;
2376 +static bool engine_wants_stop(struct utrace_engine *engine)
2378 + return (engine->flags & ENGINE_STOP) != 0;
2382 + * utrace_set_events - choose which event reports a tracing engine gets
2383 + * @target: thread to affect
2384 + * @engine: attached engine to affect
2385 + * @events: new event mask
2387 + * This changes the set of events for which @engine wants callbacks made.
2389 + * This fails with -%EALREADY and does nothing if you try to clear
2390 + * %UTRACE_EVENT(%DEATH) when the @report_death callback may already have
2391 + * begun, if you try to clear %UTRACE_EVENT(%REAP) when the @report_reap
2392 + * callback may already have begun, or if you try to newly set
2393 + * %UTRACE_EVENT(%DEATH) or %UTRACE_EVENT(%QUIESCE) when @target is
2394 + * already dead or dying.
2396 + * This can fail with -%ESRCH when @target has already been detached,
2397 + * including forcible detach on reaping.
2399 + * If @target was stopped before the call, then after a successful call,
2400 + * no event callbacks not requested in @events will be made; if
2401 + * %UTRACE_EVENT(%QUIESCE) is included in @events, then a @report_quiesce
2402 + * callback will be made when @target resumes. If @target was not stopped,
2403 + * and was about to make a callback to @engine, this returns -%EINPROGRESS.
2404 + * In this case, the callback in progress might be one excluded from the
2405 + * new @events setting. When this returns zero, you can be sure that no
2406 + * event callbacks you've disabled in @events can be made.
2408 + * To synchronize after an -%EINPROGRESS return, see utrace_barrier().
2410 + * When @target is @current, -%EINPROGRESS is not returned. But
2411 + * note that a newly-created engine will not receive any callbacks
2412 + * related to an event notification already in progress. This call
2413 + * enables @events callbacks to be made as soon as @engine becomes
2414 + * eligible for any callbacks, see utrace_attach_task().
2416 + * These rules provide for coherent synchronization based on %UTRACE_STOP,
2417 + * even when %SIGKILL is breaking its normal simple rules.
2419 +int utrace_set_events(struct task_struct *target,
2420 + struct utrace_engine *engine,
2421 + unsigned long events)
2423 + struct utrace *utrace;
2424 + unsigned long old_flags, old_utrace_flags, set_utrace_flags;
2427 + utrace = get_utrace_lock(target, engine, true);
2428 + if (unlikely(IS_ERR(utrace)))
2429 + return PTR_ERR(utrace);
2431 + old_utrace_flags = target->utrace_flags;
2432 + set_utrace_flags = events;
2433 + old_flags = engine->flags;
2435 + if (target->exit_state &&
2436 + (((events & ~old_flags) & _UTRACE_DEATH_EVENTS) ||
2438 + ((old_flags & ~events) & _UTRACE_DEATH_EVENTS)) ||
2439 + (utrace->reap && ((old_flags & ~events) & UTRACE_EVENT(REAP))))) {
2440 + spin_unlock(&utrace->lock);
2445 + * When setting these flags, it's essential that we really
2446 + * synchronize with exit_notify(). They cannot be set after
2447 + * exit_notify() takes the tasklist_lock. By holding the read
2448 + * lock here while setting the flags, we ensure that the calls
2449 + * to tracehook_notify_death() and tracehook_report_death() will
2450 + * see the new flags. This ensures that utrace_release_task()
2451 + * knows positively that utrace_report_death() will be called or
2454 + if ((set_utrace_flags & ~old_utrace_flags) & _UTRACE_DEATH_EVENTS) {
2455 + read_lock(&tasklist_lock);
2456 + if (unlikely(target->exit_state)) {
2457 + read_unlock(&tasklist_lock);
2458 + spin_unlock(&utrace->lock);
2461 + target->utrace_flags |= set_utrace_flags;
2462 + read_unlock(&tasklist_lock);
2465 + engine->flags = events | (engine->flags & ENGINE_STOP);
2466 + target->utrace_flags |= set_utrace_flags;
2468 + if ((set_utrace_flags & UTRACE_EVENT_SYSCALL) &&
2469 + !(old_utrace_flags & UTRACE_EVENT_SYSCALL))
2470 + set_tsk_thread_flag(target, TIF_SYSCALL_TRACE);
2473 + if (!utrace->stopped && target != current) {
2475 + * This barrier ensures that our engine->flags changes
2476 + * have hit before we examine utrace->reporting,
2477 + * pairing with the barrier in start_callback(). If
2478 + * @target has not yet hit finish_callback() to clear
2479 + * utrace->reporting, we might be in the middle of a
2480 + * callback to @engine.
2483 + if (utrace->reporting == engine)
2484 + ret = -EINPROGRESS;
2487 + spin_unlock(&utrace->lock);
2491 +EXPORT_SYMBOL_GPL(utrace_set_events);
2494 + * Asynchronously mark an engine as being detached.
2496 + * This must work while the target thread races with us doing
2497 + * start_callback(), defined below. It uses smp_rmb() between checking
2498 + * @engine->flags and using @engine->ops. Here we change @engine->ops
2499 + * first, then use smp_wmb() before changing @engine->flags. This ensures
2500 + * it can check the old flags before using the old ops, or check the old
2501 + * flags before using the new ops, or check the new flags before using the
2502 + * new ops, but can never check the new flags before using the old ops.
2503 + * Hence, utrace_detached_ops might be used with any old flags in place.
2504 + * It has report_quiesce() and report_reap() callbacks to handle all cases.
2506 +static void mark_engine_detached(struct utrace_engine *engine)
2508 + engine->ops = &utrace_detached_ops;
2510 + engine->flags = UTRACE_EVENT(QUIESCE);
2514 + * Get @target to stop and return true if it is already stopped now.
2515 + * If we return false, it will make some event callback soonish.
2516 + * Called with @utrace locked.
2518 +static bool utrace_do_stop(struct task_struct *target, struct utrace *utrace)
2520 + bool stopped = false;
2522 + spin_lock_irq(&target->sighand->siglock);
2523 + if (unlikely(target->exit_state)) {
2525 + * On the exit path, it's only truly quiescent
2526 + * if it has already been through
2527 + * utrace_report_death(), or never will.
2529 + if (!(target->utrace_flags & _UTRACE_DEATH_EVENTS))
2530 + utrace->stopped = stopped = true;
2531 + } else if (task_is_stopped(target)) {
2533 + * Stopped is considered quiescent; when it wakes up, it will
2534 + * go through utrace_get_signal() before doing anything else.
2536 + utrace->stopped = stopped = true;
2537 + } else if (!utrace->report && !utrace->interrupt) {
2538 + utrace->report = 1;
2539 + set_notify_resume(target);
2541 + spin_unlock_irq(&target->sighand->siglock);
2547 + * If the target is not dead it should not be in tracing
2548 + * stop any more. Wake it unless it's in job control stop.
2550 + * Called with @utrace->lock held and @utrace->stopped set.
2552 +static void utrace_wakeup(struct task_struct *target, struct utrace *utrace)
2554 + struct sighand_struct *sighand;
2555 + unsigned long irqflags;
2557 + utrace->stopped = 0;
2559 + sighand = lock_task_sighand(target, &irqflags);
2560 + if (unlikely(!sighand))
2563 + if (likely(task_is_stopped_or_traced(target))) {
2564 + if (target->signal->flags & SIGNAL_STOP_STOPPED)
2565 + target->state = TASK_STOPPED;
2567 + wake_up_state(target, __TASK_STOPPED | __TASK_TRACED);
2570 + unlock_task_sighand(target, &irqflags);
2574 + * This is called when there might be some detached engines on the list or
2575 + * some stale bits in @task->utrace_flags. Clean them up and recompute the
2578 + * @action is NULL when @task is stopped and @utrace->stopped is set; wake
2579 + * it up if it should not be. @action is set when @task is current; if
2580 + * we're fully detached, reset *@action to UTRACE_RESUME.
2582 + * Called with @utrace->lock held, returns with it released.
2583 + * After this returns, @utrace might be freed if everything detached.
2585 +static void utrace_reset(struct task_struct *task, struct utrace *utrace,
2586 + enum utrace_resume_action *action)
2587 + __releases(utrace->lock)
2589 + struct utrace_engine *engine, *next;
2590 + unsigned long flags = 0;
2591 + LIST_HEAD(detached);
2592 + bool wake = !action;
2593 + BUG_ON(wake != (task != current));
2595 + splice_attaching(utrace);
2598 + * Update the set of events of interest from the union
2599 + * of the interests of the remaining tracing engines.
2600 + * For any engine marked detached, remove it from the list.
2601 + * We'll collect them on the detached list.
2603 + list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
2604 + if (engine->ops == &utrace_detached_ops) {
2605 + engine->ops = NULL;
2606 + list_move(&engine->entry, &detached);
2608 + flags |= engine->flags | UTRACE_EVENT(REAP);
2609 + wake = wake && !engine_wants_stop(engine);
2613 + if (task->exit_state) {
2615 + * Once it's already dead, we never install any flags
2616 + * except REAP. When ->exit_state is set and events
2617 + * like DEATH are not set, then they never can be set.
2618 + * This ensures that utrace_release_task() knows
2619 + * positively that utrace_report_death() can never run.
2621 + BUG_ON(utrace->death);
2622 + flags &= UTRACE_EVENT(REAP);
2624 + } else if (!(flags & UTRACE_EVENT_SYSCALL) &&
2625 + test_tsk_thread_flag(task, TIF_SYSCALL_TRACE)) {
2626 + clear_tsk_thread_flag(task, TIF_SYSCALL_TRACE);
2629 + task->utrace_flags = flags;
2632 + utrace_wakeup(task, utrace);
2635 + * If any engines are left, we're done.
2637 + spin_unlock(&utrace->lock);
2640 + * No more engines, cleared out the utrace.
2644 + *action = UTRACE_RESUME;
2647 + put_detached_list(&detached);
2651 + * You can't do anything to a dead task but detach it.
2652 + * If release_task() has been called, you can't do that.
2654 + * On the exit path, DEATH and QUIESCE event bits are set only
2655 + * before utrace_report_death() has taken the lock. At that point,
2656 + * the death report will come soon, so disallow detach until it's
2657 + * done. This prevents us from racing with it detaching itself.
2659 + * Called with utrace->lock held, when @target->exit_state is nonzero.
2661 +static inline int utrace_control_dead(struct task_struct *target,
2662 + struct utrace *utrace,
2663 + enum utrace_resume_action action)
2665 + if (action != UTRACE_DETACH || unlikely(utrace->reap))
2668 + if (unlikely(utrace->death))
2670 + * We have already started the death report. We can't
2671 + * prevent the report_death and report_reap callbacks,
2672 + * so tell the caller they will happen.
2680 + * utrace_control - control a thread being traced by a tracing engine
2681 + * @target: thread to affect
2682 + * @engine: attached engine to affect
2683 + * @action: &enum utrace_resume_action for thread to do
2685 + * This is how a tracing engine asks a traced thread to do something.
2686 + * This call is controlled by the @action argument, which has the
2687 + * same meaning as the &enum utrace_resume_action value returned by
2688 + * event reporting callbacks.
2690 + * If @target is already dead (@target->exit_state nonzero),
2691 + * all actions except %UTRACE_DETACH fail with -%ESRCH.
2693 + * The following sections describe each option for the @action argument.
2697 + * After this, the @engine data structure is no longer accessible,
2698 + * and the thread might be reaped. The thread will start running
2699 + * again if it was stopped and no longer has any attached engines
2700 + * that want it stopped.
2702 + * If the @report_reap callback may already have begun, this fails
2703 + * with -%ESRCH. If the @report_death callback may already have
2704 + * begun, this fails with -%EALREADY.
2706 + * If @target is not already stopped, then a callback to this engine
2707 + * might be in progress or about to start on another CPU. If so,
2708 + * then this returns -%EINPROGRESS; the detach happens as soon as
2709 + * the pending callback is finished. To synchronize after an
2710 + * -%EINPROGRESS return, see utrace_barrier().
2712 + * If @target is properly stopped before utrace_control() is called,
2713 + * then after successful return it's guaranteed that no more callbacks
2714 + * to the @engine->ops vector will be made.
2716 + * The only exception is %SIGKILL (and exec or group-exit by another
2717 + * thread in the group), which can cause asynchronous @report_death
2718 + * and/or @report_reap callbacks even when %UTRACE_STOP was used.
2719 + * (In that event, this fails with -%ESRCH or -%EALREADY, see above.)
2722 + * This asks that @target stop running. This returns 0 only if
2723 + * @target is already stopped, either for tracing or for job
2724 + * control. Then @target will remain stopped until another
2725 + * utrace_control() call is made on @engine; @target can be woken
2726 + * only by %SIGKILL (or equivalent, such as exec or termination by
2727 + * another thread in the same thread group).
2729 + * This returns -%EINPROGRESS if @target is not already stopped.
2730 + * Then the effect is like %UTRACE_REPORT. A @report_quiesce or
2731 + * @report_signal callback will be made soon. Your callback can
2732 + * then return %UTRACE_STOP to keep @target stopped.
2734 + * This does not interrupt system calls in progress, including ones
2735 + * that sleep for a long time. For that, use %UTRACE_INTERRUPT.
2736 + * To interrupt system calls and then keep @target stopped, your
2737 + * @report_signal callback can return %UTRACE_STOP.
2741 + * Just let @target continue running normally, reversing the effect
2742 + * of a previous %UTRACE_STOP. If another engine is keeping @target
2743 + * stopped, then it remains stopped until all engines let it resume.
2744 + * If @target was not stopped, this has no effect.
2748 + * This is like %UTRACE_RESUME, but also ensures that there will be
2749 + * a @report_quiesce or @report_signal callback made soon. If
2750 + * @target had been stopped, then there will be a callback before it
2751 + * resumes running normally. If another engine is keeping @target
2752 + * stopped, then there might be no callbacks until all engines let
2755 + * UTRACE_INTERRUPT:
2757 + * This is like %UTRACE_REPORT, but ensures that @target will make a
2758 + * @report_signal callback before it resumes or delivers signals.
2759 + * If @target was in a system call or about to enter one, work in
2760 + * progress will be interrupted as if by %SIGSTOP. If another
2761 + * engine is keeping @target stopped, then there might be no
2762 + * callbacks until all engines let it resume.
2764 + * This gives @engine an opportunity to introduce a forced signal
2765 + * disposition via its @report_signal callback.
2767 + * UTRACE_SINGLESTEP:
2769 + * It's invalid to use this unless arch_has_single_step() returned true.
2770 + * This is like %UTRACE_RESUME, but resumes for one user instruction
2771 + * only. It's invalid to use this in utrace_control() unless @target
2772 + * had been stopped by @engine previously.
2774 + * Note that passing %UTRACE_SINGLESTEP or %UTRACE_BLOCKSTEP to
2775 + * utrace_control() or returning it from an event callback alone does
2776 + * not necessarily ensure that stepping will be enabled. If there are
2777 + * more callbacks made to any engine before returning to user mode,
2778 + * then the resume action is chosen only by the last set of callbacks.
2779 + * To be sure, enable %UTRACE_EVENT(%QUIESCE) and look for the
2780 + * @report_quiesce callback with a zero event mask, or the
2781 + * @report_signal callback with %UTRACE_SIGNAL_REPORT.
2783 + * UTRACE_BLOCKSTEP:
2785 + * It's invalid to use this unless arch_has_block_step() returned true.
2786 + * This is like %UTRACE_SINGLESTEP, but resumes for one whole basic
2787 + * block of user instructions.
2789 + * %UTRACE_BLOCKSTEP devolves to %UTRACE_SINGLESTEP when another
2790 + * tracing engine is using %UTRACE_SINGLESTEP at the same time.
2792 +int utrace_control(struct task_struct *target,
2793 + struct utrace_engine *engine,
2794 + enum utrace_resume_action action)
2796 + struct utrace *utrace;
2800 + if (unlikely(action > UTRACE_DETACH))
2803 + utrace = get_utrace_lock(target, engine, true);
2804 + if (unlikely(IS_ERR(utrace)))
2805 + return PTR_ERR(utrace);
2807 + if (target->exit_state) {
2808 + ret = utrace_control_dead(target, utrace, action);
2810 + spin_unlock(&utrace->lock);
2815 + resume = utrace->stopped;
2818 + clear_engine_wants_stop(engine);
2821 + mark_engine_wants_stop(engine);
2822 + if (!resume && !utrace_do_stop(target, utrace))
2823 + ret = -EINPROGRESS;
2827 + case UTRACE_DETACH:
2828 + mark_engine_detached(engine);
2829 + resume = resume || utrace_do_stop(target, utrace);
2832 + * As in utrace_set_events(), this barrier ensures
2833 + * that our engine->flags changes have hit before we
2834 + * examine utrace->reporting, pairing with the barrier
2835 + * in start_callback(). If @target has not yet hit
2836 + * finish_callback() to clear utrace->reporting, we
2837 + * might be in the middle of a callback to @engine.
2840 + if (utrace->reporting == engine)
2841 + ret = -EINPROGRESS;
2844 + /* Fall through. */
2846 + case UTRACE_RESUME:
2848 + * This and all other cases imply resuming if stopped.
2849 + * There might not be another report before it just
2850 + * resumes, so make sure single-step is not left set.
2852 + if (likely(resume))
2853 + user_disable_single_step(target);
2856 + case UTRACE_REPORT:
2858 + * Make the thread call tracehook_notify_resume() soon.
2859 + * But don't bother if it's already been interrupted.
2860 + * In that case, utrace_get_signal() will be reporting soon.
2862 + if (!utrace->report && !utrace->interrupt) {
2863 + utrace->report = 1;
2864 + set_notify_resume(target);
2868 + case UTRACE_INTERRUPT:
2870 + * Make the thread call tracehook_get_signal() soon.
2872 + if (utrace->interrupt)
2874 + utrace->interrupt = 1;
2877 + * If it's not already stopped, interrupt it now.
2878 + * We need the siglock here in case it calls
2879 + * recalc_sigpending() and clears its own
2880 + * TIF_SIGPENDING. By taking the lock, we've
2881 + * serialized any later recalc_sigpending() after
2882 + * our setting of utrace->interrupt to force it on.
2886 + * This is really just to keep the invariant
2887 + * that TIF_SIGPENDING is set with utrace->interrupt.
2888 + * When it's stopped, we know it's always going
2889 + * through utrace_get_signal and will recalculate.
2891 + set_tsk_thread_flag(target, TIF_SIGPENDING);
2893 + struct sighand_struct *sighand;
2894 + unsigned long irqflags;
2895 + sighand = lock_task_sighand(target, &irqflags);
2896 + if (likely(sighand)) {
2897 + signal_wake_up(target, 0);
2898 + unlock_task_sighand(target, &irqflags);
2903 + case UTRACE_BLOCKSTEP:
2905 + * Resume from stopped, step one block.
2907 + if (unlikely(!arch_has_block_step())) {
2909 + /* Fall through to treat it as SINGLESTEP. */
2910 + } else if (likely(resume)) {
2911 + user_enable_block_step(target);
2915 + case UTRACE_SINGLESTEP:
2917 + * Resume from stopped, step one instruction.
2919 + if (unlikely(!arch_has_single_step())) {
2922 + ret = -EOPNOTSUPP;
2926 + if (likely(resume))
2927 + user_enable_single_step(target);
2930 + * You were supposed to stop it before asking
2938 + * Let the thread resume running. If it's not stopped now,
2939 + * there is nothing more we need to do.
2942 + utrace_reset(target, utrace, NULL);
2944 + spin_unlock(&utrace->lock);
2948 +EXPORT_SYMBOL_GPL(utrace_control);
2951 + * utrace_barrier - synchronize with simultaneous tracing callbacks
2952 + * @target: thread to affect
2953 + * @engine: engine to affect (can be detached)
2955 + * This blocks while @target might be in the midst of making a callback to
2956 + * @engine. It can be interrupted by signals and will return -%ERESTARTSYS.
2957 + * A return value of zero means no callback from @target to @engine was
2958 + * in progress. Any effect of its return value (such as %UTRACE_STOP) has
2959 + * already been applied to @engine.
2961 + * It's not necessary to keep the @target pointer alive for this call.
2962 + * It's only necessary to hold a ref on @engine. This will return
2963 + * safely even if @target has been reaped and has no task refs.
2965 + * A successful return from utrace_barrier() guarantees its ordering
2966 + * with respect to utrace_set_events() and utrace_control() calls. If
2967 + * @target was not properly stopped, event callbacks just disabled might
2968 + * still be in progress; utrace_barrier() waits until there is no chance
2969 + * an unwanted callback can be in progress.
2971 +int utrace_barrier(struct task_struct *target, struct utrace_engine *engine)
2973 + struct utrace *utrace;
2974 + int ret = -ERESTARTSYS;
2976 + if (unlikely(target == current))
2980 + utrace = get_utrace_lock(target, engine, false);
2981 + if (unlikely(IS_ERR(utrace))) {
2982 + ret = PTR_ERR(utrace);
2983 + if (ret != -ERESTARTSYS)
2987 + * All engine state changes are done while
2988 + * holding the lock, i.e. before we get here.
2989 + * Since we have the lock, we only need to
2990 + * worry about @target making a callback.
2991 + * When it has entered start_callback() but
2992 + * not yet gotten to finish_callback(), we
2993 + * will see utrace->reporting == @engine.
2994 + * When @target doesn't take the lock, it uses
2995 + * barriers to order setting utrace->reporting
2996 + * before it examines the engine state.
2998 + if (utrace->reporting != engine)
3000 + spin_unlock(&utrace->lock);
3004 + schedule_timeout_interruptible(1);
3005 + } while (!signal_pending(current));
3009 +EXPORT_SYMBOL_GPL(utrace_barrier);
3012 + * This is local state used for reporting loops, perhaps optimized away.
3014 +struct utrace_report {
3015 + enum utrace_resume_action action;
3023 +#define INIT_REPORT(var) \
3024 + struct utrace_report var = { UTRACE_RESUME, 0, \
3025 + false, false, false, false }
3028 + * We are now making the report, so clear the flag saying we need one.
3030 +static void start_report(struct utrace *utrace)
3032 + BUG_ON(utrace->stopped);
3033 + if (utrace->report) {
3034 + spin_lock(&utrace->lock);
3035 + utrace->report = 0;
3036 + splice_attaching(utrace);
3037 + spin_unlock(&utrace->lock);
3042 + * Complete a normal reporting pass, pairing with a start_report() call.
3043 + * This handles any UTRACE_DETACH or UTRACE_REPORT or UTRACE_INTERRUPT
3044 + * returns from engine callbacks. If any engine's last callback used
3045 + * UTRACE_STOP, we do UTRACE_REPORT here to ensure we stop before user
3046 + * mode. If there were no callbacks made, it will recompute
3047 + * @task->utrace_flags to avoid another false-positive.
3049 +static void finish_report(struct utrace_report *report,
3050 + struct task_struct *task, struct utrace *utrace)
3052 + bool clean = (report->takers && !report->detaches);
3054 + if (report->action <= UTRACE_REPORT && !utrace->report) {
3055 + spin_lock(&utrace->lock);
3056 + utrace->report = 1;
3057 + set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
3058 + } else if (report->action == UTRACE_INTERRUPT && !utrace->interrupt) {
3059 + spin_lock(&utrace->lock);
3060 + utrace->interrupt = 1;
3061 + set_tsk_thread_flag(task, TIF_SIGPENDING);
3062 + } else if (clean) {
3065 + spin_lock(&utrace->lock);
3069 + spin_unlock(&utrace->lock);
3071 + utrace_reset(task, utrace, &report->action);
3075 + * Apply the return value of one engine callback to @report.
3076 + * Returns true if @engine detached and should not get any more callbacks.
3078 +static bool finish_callback(struct utrace *utrace,
3079 + struct utrace_report *report,
3080 + struct utrace_engine *engine,
3083 + enum utrace_resume_action action = utrace_resume_action(ret);
3085 + report->result = ret & ~UTRACE_RESUME_MASK;
3088 + * If utrace_control() was used, treat that like UTRACE_DETACH here.
3090 + if (action == UTRACE_DETACH || engine->ops == &utrace_detached_ops) {
3091 + engine->ops = &utrace_detached_ops;
3092 + report->detaches = true;
3094 + if (action < report->action)
3095 + report->action = action;
3097 + if (action == UTRACE_STOP) {
3098 + if (!engine_wants_stop(engine)) {
3099 + spin_lock(&utrace->lock);
3100 + mark_engine_wants_stop(engine);
3101 + spin_unlock(&utrace->lock);
3104 + if (action == UTRACE_REPORT)
3105 + report->reports = true;
3107 + if (engine_wants_stop(engine)) {
3108 + spin_lock(&utrace->lock);
3109 + clear_engine_wants_stop(engine);
3110 + spin_unlock(&utrace->lock);
3116 + * Now that we have applied the effect of the return value,
3117 + * clear this so that utrace_barrier() can stop waiting.
3118 + * A subsequent utrace_control() can stop or resume @engine
3119 + * and know this was ordered after its callback's action.
3121 + * We don't need any barriers here because utrace_barrier()
3122 + * takes utrace->lock. If we touched engine->flags above,
3123 + * the lock guaranteed this change was before utrace_barrier()
3124 + * examined utrace->reporting.
3126 + utrace->reporting = NULL;
3129 + * This is a good place to make sure tracing engines don't
3130 + * introduce too much latency under voluntary preemption.
3132 + if (need_resched())
3135 + return engine->ops == &utrace_detached_ops;
3139 + * Start the callbacks for @engine to consider @event (a bit mask).
3140 + * This makes the report_quiesce() callback first. If @engine wants
3141 + * a specific callback for @event, we return the ops vector to use.
3142 + * If not, we return NULL. The return value from the ops->callback
3143 + * function called should be passed to finish_callback().
3145 +static const struct utrace_engine_ops *start_callback(
3146 + struct utrace *utrace, struct utrace_report *report,
3147 + struct utrace_engine *engine, struct task_struct *task,
3148 + unsigned long event)
3150 + const struct utrace_engine_ops *ops;
3151 + unsigned long want;
3154 + * This barrier ensures that we've set utrace->reporting before
3155 + * we examine engine->flags or engine->ops. utrace_barrier()
3156 + * relies on this ordering to indicate that the effect of any
3157 + * utrace_control() and utrace_set_events() calls is in place
3158 + * by the time utrace->reporting can be seen to be NULL.
3160 + utrace->reporting = engine;
3164 + * This pairs with the barrier in mark_engine_detached().
3165 + * It makes sure that we never see the old ops vector with
3166 + * the new flags, in case the original vector had no report_quiesce.
3168 + want = engine->flags;
3170 + ops = engine->ops;
3172 + if (want & UTRACE_EVENT(QUIESCE)) {
3173 + if (finish_callback(utrace, report, engine,
3174 + (*ops->report_quiesce)(report->action,
3180 + * finish_callback() reset utrace->reporting after the
3181 + * quiesce callback. Now we set it again (as above)
3182 + * before re-examining engine->flags, which could have
3183 + * been changed synchronously by ->report_quiesce or
3184 + * asynchronously by utrace_control() or utrace_set_events().
3186 + utrace->reporting = engine;
3188 + want = engine->flags;
3191 + if (want & ENGINE_STOP)
3192 + report->action = UTRACE_STOP;
3194 + if (want & event) {
3195 + report->takers = true;
3199 + utrace->reporting = NULL;
3204 + * Do a normal reporting pass for engines interested in @event.
3205 + * @callback is the name of the member in the ops vector, and remaining
3206 + * args are the extras it takes after the standard three args.
3208 +#define REPORT(task, utrace, report, event, callback, ...) \
3210 + start_report(utrace); \
3211 + REPORT_CALLBACKS(, task, utrace, report, event, callback, \
3212 + (report)->action, engine, current, \
3213 + ## __VA_ARGS__); \
3214 + finish_report(report, task, utrace); \
3216 +#define REPORT_CALLBACKS(rev, task, utrace, report, event, callback, ...) \
3218 + struct utrace_engine *engine; \
3219 + const struct utrace_engine_ops *ops; \
3220 + list_for_each_entry##rev(engine, &utrace->attached, entry) { \
3221 + ops = start_callback(utrace, report, engine, task, \
3225 + finish_callback(utrace, report, engine, \
3226 + (*ops->callback)(__VA_ARGS__)); \
3231 + * Called iff UTRACE_EVENT(EXEC) flag is set.
3233 +void utrace_report_exec(struct linux_binfmt *fmt, struct linux_binprm *bprm,
3234 + struct pt_regs *regs)
3236 + struct task_struct *task = current;
3237 + struct utrace *utrace = task_utrace_struct(task);
3238 + INIT_REPORT(report);
3240 + REPORT(task, utrace, &report, UTRACE_EVENT(EXEC),
3241 + report_exec, fmt, bprm, regs);
3245 + * Called iff UTRACE_EVENT(SYSCALL_ENTRY) flag is set.
3246 + * Return true to prevent the system call.
3248 +bool utrace_report_syscall_entry(struct pt_regs *regs)
3250 + struct task_struct *task = current;
3251 + struct utrace *utrace = task_utrace_struct(task);
3252 + INIT_REPORT(report);
3254 + start_report(utrace);
3255 + REPORT_CALLBACKS(_reverse, task, utrace, &report,
3256 + UTRACE_EVENT(SYSCALL_ENTRY), report_syscall_entry,
3257 + report.result | report.action, engine, current, regs);
3258 + finish_report(&report, task, utrace);
3260 + if (report.action == UTRACE_STOP &&
3261 + unlikely(utrace_stop(task, utrace, false)))
3263 + * We are continuing despite UTRACE_STOP because of a
3264 + * SIGKILL. Don't let the system call actually proceed.
3268 + return report.result == UTRACE_SYSCALL_ABORT;
3272 + * Called iff UTRACE_EVENT(SYSCALL_EXIT) flag is set.
3274 +void utrace_report_syscall_exit(struct pt_regs *regs)
3276 + struct task_struct *task = current;
3277 + struct utrace *utrace = task_utrace_struct(task);
3278 + INIT_REPORT(report);
3280 + REPORT(task, utrace, &report, UTRACE_EVENT(SYSCALL_EXIT),
3281 + report_syscall_exit, regs);
3285 + * Called iff UTRACE_EVENT(CLONE) flag is set.
3286 + * This notification call blocks the wake_up_new_task call on the child.
3287 + * So we must not quiesce here. tracehook_report_clone_complete will do
3288 + * a quiescence check momentarily.
3290 +void utrace_report_clone(unsigned long clone_flags, struct task_struct *child)
3292 + struct task_struct *task = current;
3293 + struct utrace *utrace = task_utrace_struct(task);
3294 + INIT_REPORT(report);
3297 + * We don't use the REPORT() macro here, because we need
3298 + * to clear utrace->cloning before finish_report().
3299 + * After finish_report(), utrace can be a stale pointer
3300 + * in cases when report.action is still UTRACE_RESUME.
3302 + start_report(utrace);
3303 + utrace->cloning = child;
3305 + REPORT_CALLBACKS(, task, utrace, &report,
3306 + UTRACE_EVENT(CLONE), report_clone,
3307 + report.action, engine, task, clone_flags, child);
3309 + utrace->cloning = NULL;
3310 + finish_report(&report, task, utrace);
3313 + * For a vfork, we will go into an uninterruptible block waiting
3314 + * for the child. We need UTRACE_STOP to happen before this, not
3315 + * after. For CLONE_VFORK, utrace_finish_vfork() will be called.
3317 + if (report.action == UTRACE_STOP && (clone_flags & CLONE_VFORK)) {
3318 + spin_lock(&utrace->lock);
3319 + utrace->vfork_stop = 1;
3320 + spin_unlock(&utrace->lock);
3325 + * We're called after utrace_report_clone() for a CLONE_VFORK.
3326 + * If UTRACE_STOP was left from the clone report, we stop here.
3327 + * After this, we'll enter the uninterruptible wait_for_completion()
3328 + * waiting for the child.
3330 +void utrace_finish_vfork(struct task_struct *task)
3332 + struct utrace *utrace = task_utrace_struct(task);
3334 + spin_lock(&utrace->lock);
3335 + if (!utrace->vfork_stop)
3336 + spin_unlock(&utrace->lock);
3338 + utrace->vfork_stop = 0;
3339 + spin_unlock(&utrace->lock);
3340 + utrace_stop(task, utrace, false);
3345 + * Called iff UTRACE_EVENT(JCTL) flag is set.
3347 + * Called with siglock held.
3349 +void utrace_report_jctl(int notify, int what)
3351 + struct task_struct *task = current;
3352 + struct utrace *utrace = task_utrace_struct(task);
3353 + INIT_REPORT(report);
3354 + bool stop = task_is_stopped(task);
3357 + * We have to come out of TASK_STOPPED in case the event report
3358 + * hooks might block. Since we held the siglock throughout, it's
3359 + * as if we were never in TASK_STOPPED yet at all.
3362 + __set_current_state(TASK_RUNNING);
3363 + task->signal->flags &= ~SIGNAL_STOP_STOPPED;
3364 + ++task->signal->group_stop_count;
3366 + spin_unlock_irq(&task->sighand->siglock);
3369 + * We get here with CLD_STOPPED when we've just entered
3370 + * TASK_STOPPED, or with CLD_CONTINUED when we've just come
3371 + * out but not yet been through utrace_get_signal() again.
3373 + * While in TASK_STOPPED, we can be considered safely
3374 + * stopped by utrace_do_stop() and detached asynchronously.
3375 + * If we woke up and checked task->utrace_flags before that
3376 + * was finished, we might be here with utrace already
3377 + * removed or in the middle of being removed.
3379 + * If we are indeed attached, then make sure we are no
3380 + * longer considered stopped while we run callbacks.
3382 + spin_lock(&utrace->lock);
3383 + utrace->stopped = 0;
3385 + * Do start_report()'s work too since we already have the lock anyway.
3387 + utrace->report = 0;
3388 + splice_attaching(utrace);
3389 + spin_unlock(&utrace->lock);
3391 + REPORT(task, utrace, &report, UTRACE_EVENT(JCTL),
3392 + report_jctl, what, notify);
3395 + * Retake the lock, and go back into TASK_STOPPED
3396 + * unless the stop was just cleared.
3398 + spin_lock_irq(&task->sighand->siglock);
3399 + if (stop && task->signal->group_stop_count > 0) {
3400 + __set_current_state(TASK_STOPPED);
3401 + if (--task->signal->group_stop_count == 0)
3402 + task->signal->flags |= SIGNAL_STOP_STOPPED;
3407 + * Called iff UTRACE_EVENT(EXIT) flag is set.
3409 +void utrace_report_exit(long *exit_code)
3411 + struct task_struct *task = current;
3412 + struct utrace *utrace = task_utrace_struct(task);
3413 + INIT_REPORT(report);
3414 + long orig_code = *exit_code;
3416 + REPORT(task, utrace, &report, UTRACE_EVENT(EXIT),
3417 + report_exit, orig_code, exit_code);
3419 + if (report.action == UTRACE_STOP)
3420 + utrace_stop(task, utrace, false);
3424 + * Called iff UTRACE_EVENT(DEATH) or UTRACE_EVENT(QUIESCE) flag is set.
3426 + * It is always possible that we are racing with utrace_release_task here.
3427 + * For this reason, utrace_release_task checks for the event bits that get
3428 + * us here, and delays its cleanup for us to do.
3430 +void utrace_report_death(struct task_struct *task, struct utrace *utrace,
3431 + bool group_dead, int signal)
3433 + INIT_REPORT(report);
3435 + BUG_ON(!task->exit_state);
3438 + * We are presently considered "quiescent"--which is accurate
3439 + * inasmuch as we won't run any more user instructions ever again.
3440 + * But for utrace_control and utrace_set_events to be robust, they
3441 + * must be sure whether or not we will run any more callbacks. If
3442 + * a call comes in before we do, taking the lock here synchronizes
3443 + * us so we don't run any callbacks just disabled. Calls that come
3444 + * in while we're running the callbacks will see the exit.death
3445 + * flag and know that we are not yet fully quiescent for purposes
3446 + * of detach bookkeeping.
3448 + spin_lock(&utrace->lock);
3449 + BUG_ON(utrace->death);
3450 + utrace->death = 1;
3451 + utrace->report = 0;
3452 + utrace->interrupt = 0;
3453 + spin_unlock(&utrace->lock);
3455 + REPORT_CALLBACKS(, task, utrace, &report, UTRACE_EVENT(DEATH),
3456 + report_death, engine, task, group_dead, signal);
3458 + spin_lock(&utrace->lock);
3461 + * After we unlock (possibly inside utrace_reap for callbacks) with
3462 + * this flag clear, competing utrace_control/utrace_set_events calls
3463 + * know that we've finished our callbacks and any detach bookkeeping.
3465 + utrace->death = 0;
3469 + * utrace_release_task() was already called in parallel.
3470 + * We must complete its work now.
3472 + utrace_reap(task, utrace);
3474 + utrace_reset(task, utrace, &report.action);
3478 + * Finish the last reporting pass before returning to user mode.
3480 +static void finish_resume_report(struct utrace_report *report,
3481 + struct task_struct *task,
3482 + struct utrace *utrace)
3484 + if (report->detaches || !report->takers) {
3485 + spin_lock(&utrace->lock);
3486 + utrace_reset(task, utrace, &report->action);
3489 + switch (report->action) {
3491 + report->killed = utrace_stop(task, utrace, report->reports);
3494 + case UTRACE_INTERRUPT:
3495 + if (!signal_pending(task))
3496 + set_tsk_thread_flag(task, TIF_SIGPENDING);
3499 + case UTRACE_BLOCKSTEP:
3500 + if (likely(arch_has_block_step())) {
3501 + user_enable_block_step(task);
3506 + * This means some callback is to blame for failing
3507 + * to check arch_has_block_step() itself. Warn and
3508 + * then fall through to treat it as SINGLESTEP.
3512 + case UTRACE_SINGLESTEP:
3513 + if (likely(arch_has_single_step()))
3514 + user_enable_single_step(task);
3517 + * This means some callback is to blame for failing
3518 + * to check arch_has_single_step() itself. Spew
3519 + * about it so the loser will fix his module.
3524 + case UTRACE_REPORT:
3525 + case UTRACE_RESUME:
3527 + user_disable_single_step(task);
3533 + * This is called when TIF_NOTIFY_RESUME had been set (and is now clear).
3534 + * We are close to user mode, and this is the place to report or stop.
3535 + * When we return, we're going to user mode or into the signals code.
3537 +void utrace_resume(struct task_struct *task, struct pt_regs *regs)
3539 + struct utrace *utrace = task_utrace_struct(task);
3540 + INIT_REPORT(report);
3541 + struct utrace_engine *engine;
3544 + * Some machines get here with interrupts disabled. The same arch
3545 + * code path leads to calling into get_signal_to_deliver(), which
3546 + * implicitly reenables them by virtue of spin_unlock_irq.
3548 + local_irq_enable();
3551 + * If this flag is still set it's because there was a signal
3552 + * handler setup done but no report_signal following it. Clear
3553 + * the flag before we get to user so it doesn't confuse us later.
3555 + if (unlikely(utrace->signal_handler)) {
3557 + spin_lock(&utrace->lock);
3558 + utrace->signal_handler = 0;
3559 + skip = !utrace->report;
3560 + spin_unlock(&utrace->lock);
3566 + * If UTRACE_INTERRUPT was just used, we don't bother with a report
3567 + * here. We will report and stop in utrace_get_signal(). In case
3568 + * of a race with utrace_control(), make sure we don't momentarily
3569 + * return to user mode because TIF_SIGPENDING was not set yet.
3571 + if (unlikely(utrace->interrupt)) {
3572 + set_thread_flag(TIF_SIGPENDING);
3577 + * Do a simple reporting pass, with no callback after report_quiesce.
3579 + start_report(utrace);
3581 + list_for_each_entry(engine, &utrace->attached, entry)
3582 + start_callback(utrace, &report, engine, task, 0);
3585 + * Finish the report and either stop or get ready to resume.
3587 + finish_resume_report(&report, task, utrace);
3591 + * Return true if current has forced signal_pending().
3593 + * This is called only when current->utrace_flags is nonzero, so we know
3594 + * that current->utrace must be set. It's not inlined in tracehook.h
3595 + * just so that struct utrace can stay opaque outside this file.
3597 +bool utrace_interrupt_pending(void)
3599 + return task_utrace_struct(current)->interrupt;
3603 + * Take the siglock and push @info back on our queue.
3604 + * Returns with @task->sighand->siglock held.
3606 +static void push_back_signal(struct task_struct *task, siginfo_t *info)
3607 + __acquires(task->sighand->siglock)
3609 + struct sigqueue *q;
3611 + if (unlikely(!info->si_signo)) { /* Oh, a wise guy! */
3612 + spin_lock_irq(&task->sighand->siglock);
3616 + q = sigqueue_alloc();
3619 + copy_siginfo(&q->info, info);
3622 + spin_lock_irq(&task->sighand->siglock);
3624 + sigaddset(&task->pending.signal, info->si_signo);
3626 + list_add(&q->list, &task->pending.list);
3628 + set_tsk_thread_flag(task, TIF_SIGPENDING);
3632 + * This is the hook from the signals code, called with the siglock held.
3633 + * Here is the ideal place to stop. We also dequeue and intercept signals.
3635 +int utrace_get_signal(struct task_struct *task, struct pt_regs *regs,
3636 + siginfo_t *info, struct k_sigaction *return_ka)
3637 + __releases(task->sighand->siglock)
3638 + __acquires(task->sighand->siglock)
3640 + struct utrace *utrace;
3641 + struct k_sigaction *ka;
3642 + INIT_REPORT(report);
3643 + struct utrace_engine *engine;
3644 + const struct utrace_engine_ops *ops;
3645 + unsigned long event, want;
3649 + utrace = &task->utrace;
3650 + if (utrace->interrupt || utrace->report || utrace->signal_handler) {
3652 + * We've been asked for an explicit report before we
3653 + * even check for pending signals.
3656 + spin_unlock_irq(&task->sighand->siglock);
3658 + spin_lock(&utrace->lock);
3660 + splice_attaching(utrace);
3662 + if (unlikely(!utrace->interrupt) && unlikely(!utrace->report))
3663 + report.result = UTRACE_SIGNAL_IGN;
3664 + else if (utrace->signal_handler)
3665 + report.result = UTRACE_SIGNAL_HANDLER;
3667 + report.result = UTRACE_SIGNAL_REPORT;
3670 + * We are now making the report and it's on the
3671 + * interrupt path, so clear the flags asking for those.
3673 + utrace->interrupt = utrace->report = utrace->signal_handler = 0;
3674 + utrace->stopped = 0;
3677 + * Make sure signal_pending() only returns true
3678 + * if there are real signals pending.
3680 + if (signal_pending(task)) {
3681 + spin_lock_irq(&task->sighand->siglock);
3682 + recalc_sigpending();
3683 + spin_unlock_irq(&task->sighand->siglock);
3686 + spin_unlock(&utrace->lock);
3688 + if (unlikely(report.result == UTRACE_SIGNAL_IGN))
3690 + * We only got here to clear utrace->signal_handler.
3695 + * Do a reporting pass for no signal, just for EVENT(QUIESCE).
3696 + * The engine callbacks can fill in *info and *return_ka.
3697 + * We'll pass NULL for the @orig_ka argument to indicate
3698 + * that there was no original signal.
3702 + memset(return_ka, 0, sizeof *return_ka);
3703 + } else if ((task->utrace_flags & UTRACE_EVENT_SIGNAL_ALL) == 0 &&
3704 + !utrace->stopped) {
3706 + * If no engine is interested in intercepting signals,
3707 + * let the caller just dequeue them normally.
3711 + if (unlikely(utrace->stopped)) {
3712 + spin_unlock_irq(&task->sighand->siglock);
3713 + spin_lock(&utrace->lock);
3714 + utrace->stopped = 0;
3715 + spin_unlock(&utrace->lock);
3716 + spin_lock_irq(&task->sighand->siglock);
3720 + * Steal the next signal so we can let tracing engines
3721 + * examine it. From the signal number and sigaction,
3722 + * determine what normal delivery would do. If no
3723 + * engine perturbs it, we'll do that by returning the
3724 + * signal number after setting *return_ka.
3726 + signr = dequeue_signal(task, &task->blocked, info);
3729 + BUG_ON(signr != info->si_signo);
3731 + ka = &task->sighand->action[signr - 1];
3735 + * We are never allowed to interfere with SIGKILL.
3736 + * Just punt after filling in *return_ka for our caller.
3738 + if (signr == SIGKILL)
3741 + if (ka->sa.sa_handler == SIG_IGN) {
3742 + event = UTRACE_EVENT(SIGNAL_IGN);
3743 + report.result = UTRACE_SIGNAL_IGN;
3744 + } else if (ka->sa.sa_handler != SIG_DFL) {
3745 + event = UTRACE_EVENT(SIGNAL);
3746 + report.result = UTRACE_SIGNAL_DELIVER;
3747 + } else if (sig_kernel_coredump(signr)) {
3748 + event = UTRACE_EVENT(SIGNAL_CORE);
3749 + report.result = UTRACE_SIGNAL_CORE;
3750 + } else if (sig_kernel_ignore(signr)) {
3751 + event = UTRACE_EVENT(SIGNAL_IGN);
3752 + report.result = UTRACE_SIGNAL_IGN;
3753 + } else if (signr == SIGSTOP) {
3754 + event = UTRACE_EVENT(SIGNAL_STOP);
3755 + report.result = UTRACE_SIGNAL_STOP;
3756 + } else if (sig_kernel_stop(signr)) {
3757 + event = UTRACE_EVENT(SIGNAL_STOP);
3758 + report.result = UTRACE_SIGNAL_TSTP;
3760 + event = UTRACE_EVENT(SIGNAL_TERM);
3761 + report.result = UTRACE_SIGNAL_TERM;
3765 + * Now that we know what event type this signal is, we
3766 + * can short-circuit if no engines care about those.
3768 + if ((task->utrace_flags & (event | UTRACE_EVENT(QUIESCE))) == 0)
3772 + * We have some interested engines, so tell them about
3773 + * the signal and let them change its disposition.
3775 + spin_unlock_irq(&task->sighand->siglock);
3779 + * This reporting pass chooses what signal disposition we'll act on.
3781 + list_for_each_entry(engine, &utrace->attached, entry) {
3783 + * See start_callback() comment about this barrier.
3785 + utrace->reporting = engine;
3789 + * This pairs with the barrier in mark_engine_detached(),
3790 + * see start_callback() comments.
3792 + want = engine->flags;
3794 + ops = engine->ops;
3796 + if ((want & (event | UTRACE_EVENT(QUIESCE))) == 0) {
3797 + utrace->reporting = NULL;
3801 + if (ops->report_signal)
3802 + ret = (*ops->report_signal)(
3803 + report.result | report.action, engine, task,
3804 + regs, info, ka, return_ka);
3806 + ret = (report.result | (*ops->report_quiesce)(
3807 + report.action, engine, task, event));
3810 + * Avoid a tight loop reporting again and again if some
3811 + * engine is too stupid.
3813 + switch (utrace_resume_action(ret)) {
3816 + case UTRACE_INTERRUPT:
3817 + case UTRACE_REPORT:
3818 + ret = (ret & ~UTRACE_RESUME_MASK) | UTRACE_RESUME;
3822 + finish_callback(utrace, &report, engine, ret);
3826 + * We express the chosen action to the signals code in terms
3827 + * of a representative signal whose default action does it.
3828 + * Our caller uses our return value (signr) to decide what to
3829 + * do, but uses info->si_signo as the signal number to report.
3831 + switch (utrace_signal_action(report.result)) {
3832 + case UTRACE_SIGNAL_TERM:
3836 + case UTRACE_SIGNAL_CORE:
3840 + case UTRACE_SIGNAL_STOP:
3844 + case UTRACE_SIGNAL_TSTP:
3848 + case UTRACE_SIGNAL_DELIVER:
3849 + signr = info->si_signo;
3851 + if (return_ka->sa.sa_handler == SIG_DFL) {
3853 + * We'll do signr's normal default action.
3854 + * For ignore, we'll fall through below.
3855 + * For stop/death, break locks and returns it.
3857 + if (likely(signr) && !sig_kernel_ignore(signr))
3859 + } else if (return_ka->sa.sa_handler != SIG_IGN &&
3862 + * Complete the bookkeeping after the report.
3863 + * The handler will run. If an engine wanted to
3864 + * stop or step, then make sure we do another
3865 + * report after signal handler setup.
3867 + if (report.action != UTRACE_RESUME)
3868 + report.action = UTRACE_INTERRUPT;
3869 + finish_report(&report, task, utrace);
3871 + if (unlikely(report.result & UTRACE_SIGNAL_HOLD))
3872 + push_back_signal(task, info);
3874 + spin_lock_irq(&task->sighand->siglock);
3877 + * We do the SA_ONESHOT work here since the
3878 + * normal path will only touch *return_ka now.
3880 + if (unlikely(return_ka->sa.sa_flags & SA_ONESHOT)) {
3881 + return_ka->sa.sa_flags &= ~SA_ONESHOT;
3882 + if (likely(valid_signal(signr))) {
3883 + ka = &task->sighand->action[signr - 1];
3884 + ka->sa.sa_handler = SIG_DFL;
3891 + /* Fall through for an ignored signal. */
3893 + case UTRACE_SIGNAL_IGN:
3894 + case UTRACE_SIGNAL_REPORT:
3897 + * If the signal is being ignored, then we are on the way
3898 + * directly back to user mode. We can stop here, or step,
3899 + * as in utrace_resume(), above. After we've dealt with that,
3900 + * our caller will relock and come back through here.
3902 + finish_resume_report(&report, task, utrace);
3904 + if (unlikely(report.killed)) {
3906 + * The only reason we woke up now was because of a
3907 + * SIGKILL. Don't do normal dequeuing in case it
3908 + * might get a signal other than SIGKILL. That would
3909 + * perturb the death state so it might differ from
3910 + * what the debugger would have allowed to happen.
3911 + * Instead, pluck out just the SIGKILL to be sure
3912 + * we'll die immediately with nothing else different
3913 + * from the quiescent state the debugger wanted us in.
3915 + sigset_t sigkill_only;
3916 + siginitsetinv(&sigkill_only, sigmask(SIGKILL));
3917 + spin_lock_irq(&task->sighand->siglock);
3918 + signr = dequeue_signal(task, &sigkill_only, info);
3919 + BUG_ON(signr != SIGKILL);
3920 + *return_ka = task->sighand->action[SIGKILL - 1];
3924 + if (unlikely(report.result & UTRACE_SIGNAL_HOLD)) {
3925 + push_back_signal(task, info);
3926 + spin_unlock_irq(&task->sighand->siglock);
3933 + * Complete the bookkeeping after the report.
3934 + * This sets utrace->report if UTRACE_STOP was used.
3936 + finish_report(&report, task, utrace);
3938 + return_ka->sa.sa_handler = SIG_DFL;
3940 + if (unlikely(report.result & UTRACE_SIGNAL_HOLD))
3941 + push_back_signal(task, info);
3943 + spin_lock_irq(&task->sighand->siglock);
3945 + if (sig_kernel_stop(signr))
3946 + task->signal->flags |= SIGNAL_STOP_DEQUEUED;
3952 + * This gets called after a signal handler has been set up.
3953 + * We set a flag so the next report knows it happened.
3954 + * If we're already stepping, make sure we do a report_signal.
3955 + * If not, make sure we get into utrace_resume() where we can
3956 + * clear the signal_handler flag before resuming.
3958 +void utrace_signal_handler(struct task_struct *task, int stepping)
3960 + struct utrace *utrace = task_utrace_struct(task);
3962 + spin_lock(&utrace->lock);
3964 + utrace->signal_handler = 1;
3966 + utrace->interrupt = 1;
3967 + set_tsk_thread_flag(task, TIF_SIGPENDING);
3969 + set_tsk_thread_flag(task, TIF_NOTIFY_RESUME);
3972 + spin_unlock(&utrace->lock);
3976 + * utrace_prepare_examine - prepare to examine thread state
3977 + * @target: thread of interest, a &struct task_struct pointer
3978 + * @engine: engine pointer returned by utrace_attach_task()
3979 + * @exam: temporary state, a &struct utrace_examiner pointer
3981 + * This call prepares to safely examine the thread @target using
3982 + * &struct user_regset calls, or direct access to thread-synchronous fields.
3984 + * When @target is current, this call is superfluous. When @target is
3985 + * another thread, it must held stopped via %UTRACE_STOP by @engine.
3987 + * This call may block the caller until @target stays stopped, so it must
3988 + * be called only after the caller is sure @target is about to unschedule.
3989 + * This means a zero return from a utrace_control() call on @engine giving
3990 + * %UTRACE_STOP, or a report_quiesce() or report_signal() callback to
3991 + * @engine that used %UTRACE_STOP in its return value.
3993 + * Returns -%ESRCH if @target is dead or -%EINVAL if %UTRACE_STOP was
3994 + * not used. If @target has started running again despite %UTRACE_STOP
3995 + * (for %SIGKILL or a spurious wakeup), this call returns -%EAGAIN.
3997 + * When this call returns zero, it's safe to use &struct user_regset
3998 + * calls and task_user_regset_view() on @target and to examine some of
3999 + * its fields directly. When the examination is complete, a
4000 + * utrace_finish_examine() call must follow to check whether it was
4001 + * completed safely.
4003 +int utrace_prepare_examine(struct task_struct *target,
4004 + struct utrace_engine *engine,
4005 + struct utrace_examiner *exam)
4009 + if (unlikely(target == current))
4013 + if (unlikely(!engine_wants_stop(engine)))
4015 + else if (unlikely(target->exit_state))
4018 + exam->state = target->state;
4019 + if (unlikely(exam->state == TASK_RUNNING))
4022 + get_task_struct(target);
4024 + rcu_read_unlock();
4026 + if (likely(!ret)) {
4027 + exam->ncsw = wait_task_inactive(target, exam->state);
4028 + put_task_struct(target);
4029 + if (unlikely(!exam->ncsw))
4035 +EXPORT_SYMBOL_GPL(utrace_prepare_examine);
4038 + * utrace_finish_examine - complete an examination of thread state
4039 + * @target: thread of interest, a &struct task_struct pointer
4040 + * @engine: engine pointer returned by utrace_attach_task()
4041 + * @exam: pointer passed to utrace_prepare_examine() call
4043 + * This call completes an examination on the thread @target begun by a
4044 + * paired utrace_prepare_examine() call with the same arguments that
4045 + * returned success (zero).
4047 + * When @target is current, this call is superfluous. When @target is
4048 + * another thread, this returns zero if @target has remained unscheduled
4049 + * since the paired utrace_prepare_examine() call returned zero.
4051 + * When this returns an error, any examination done since the paired
4052 + * utrace_prepare_examine() call is unreliable and the data extracted
4053 + * should be discarded. The error is -%EINVAL if @engine is not
4054 + * keeping @target stopped, or -%EAGAIN if @target woke up unexpectedly.
4056 +int utrace_finish_examine(struct task_struct *target,
4057 + struct utrace_engine *engine,
4058 + struct utrace_examiner *exam)
4062 + if (unlikely(target == current))
4066 + if (unlikely(!engine_wants_stop(engine)))
4068 + else if (unlikely(target->state != exam->state))
4071 + get_task_struct(target);
4072 + rcu_read_unlock();
4074 + if (likely(!ret)) {
4075 + unsigned long ncsw = wait_task_inactive(target, exam->state);
4076 + if (unlikely(ncsw != exam->ncsw))
4078 + put_task_struct(target);
4083 +EXPORT_SYMBOL_GPL(utrace_finish_examine);
4086 + * This is declared in linux/regset.h and defined in machine-dependent
4087 + * code. We put the export here to ensure no machine forgets it.
4089 +EXPORT_SYMBOL_GPL(task_user_regset_view);
4092 + * Called with rcu_read_lock() held.
4094 +void task_utrace_proc_status(struct seq_file *m, struct task_struct *p)
4096 + struct utrace *utrace = &p->utrace;
4097 + seq_printf(m, "Utrace:\t%lx%s%s%s\n",
4099 + utrace->stopped ? " (stopped)" : "",
4100 + utrace->report ? " (report)" : "",
4101 + utrace->interrupt ? " (interrupt)" : "");