+
+
+ Teardown Races
+ -------- -----
+
+Ordinarily synchronization issues for tracing engines are kept fairly
+straightforward by using quiescence (see above): you make a thread
+quiescent and then once it makes the report_quiesce callback it cannot
+do anything else that would result in another callback, until you let
+it. This simple arrangement avoids complex and error-prone code in
+each one of a tracing engine's event callbacks to keep them serialized
+with the engine's other operations done on that thread from another
+thread of control. However, giving tracing engines complete power to
+keep a traced thread stuck in place runs afoul of a more important
+kind of simplicity that the kernel overall guarantees: nothing can
+prevent or delay SIGKILL from making a thread die and release its
+resources. To preserve this important property of SIGKILL, it as a
+special case can break quiescence like nothing else normally can.
+This includes both explicit SIGKILL signals and the implicit SIGKILL
+sent to each other thread in the same thread group by a thread doing
+an exec, or processing a fatal signal, or making an exit_group system
+call. A tracing engine can prevent a thread from beginning the exit
+or exec or dying by signal (other than SIGKILL) if it is attached to
+that thread, but once the operation begins, no tracing engine can
+prevent or delay all other threads in the same thread group dying.
+
+As described above, the report_reap callback is always the final event
+in the life cycle of a traced thread. Tracing engines can use this as
+the trigger to clean up their own data structures. The report_death
+callback is always the penultimate event a tracing engine might see,
+except when the thread was already in the midst of dying when the
+engine attached. Many tracing engines will have no interest in when a
+parent reaps a dead process, and nothing they want to do with a zombie
+thread once it dies; for them, the report_death callback is the
+natural place to clean up data structures and detach. To facilitate
+writing such engines robustly, given the asynchrony of SIGKILL, and
+without error-prone manual implementation of synchronization schemes,
+the utrace infrastructure provides some special guarantees about the
+report_death and report_reap callbacks. It still takes some care to
+be sure your tracing engine is robust to teardown races, but these
+rules make it reasonably straightforward and concise to handle a lot
+of corner cases correctly.
+
+The first sort of guarantee concerns the core data structures
+themselves. struct utrace_attached_engine is allocated using RCU, as
+is task_struct. If you call utrace_attach under rcu_read_lock, then
+the pointer it returns will always be valid while in the RCU critical
+section. (Note that utrace_attach can block doing memory allocation,
+so you must consider the real critical section to start when
+utrace_attach returns. utrace_attach can never block when not given
+the UTRACE_ATTACH_CREATE flag bit). Conversely, you can call
+utrace_attach outside of rcu_read_lock and though the pointer can
+become stale asynchronously if the thread dies and is reaped, you can
+safely pass it to a subsequent utrace_set_flags or utrace_detach call
+and will just get an -ESRCH error return. However, you must be sure
+the task_struct remains valid, either via get_task_struct or via RCU.
+The utrace infrastructure never holds task_struct references of its
+own. Though neither rcu_read_lock nor any other lock is held while
+making a callback, it's always guaranteed that the task_struct and
+the struct utrace_attached_engine passed as arguments remain valid
+until the callback function returns.
+
+The second guarantee is the serialization of death and reap event
+callbacks for a given thread. The actual reaping by the parent
+(release_task call) can occur simultaneously while the thread is
+still doing the final steps of dying, including the report_death
+callback. If a tracing engine has requested both DEATH and REAP
+event reports, it's guaranteed that the report_reap callback will not
+be made until after the report_death callback has returned. If the
+report_death callback itself detaches from the thread (with
+utrace_detach or with UTRACE_ACTION_DETACH in its return value), then
+the report_reap callback will never be made. Thus it is safe for a
+report_death callback to clean up data structures and detach.
+
+The final sort of guarantee is that a tracing engine will know for
+sure whether or not the report_death and/or report_reap callbacks
+will be made for a certain thread. These teardown races are
+disambiguated by the error return values of utrace_set_flags and
+utrace_detach. Normally utrace_detach returns zero, and this means
+that no more callbacks will be made. If the thread is in the midst
+of dying, utrace_detach returns -EALREADY to indicate that the
+report_death callback may already be in progress; when you get this
+error, you know that any cleanup your report_death callback does is
+about to happen or has just happened--note that if the report_death
+callback does not detach, the engine remains attached until the
+thread gets reaped. If the thread is in the midst of being reaped,
+utrace_detach returns -ESRCH to indicate that the report_reap
+callback may already be in progress; this means the engine is
+implicitly detached when the callback completes. This makes it
+possible for a tracing engine that has decided asynchronously to
+detach from a thread to safely clean up its data structures, knowing
+that no report_death or report_reap callback will try to do the
+same. utrace_detach returns -ESRCH when the struct
+utrace_attached_engine has already been detached, but is still a
+valid pointer because of rcu_read_lock. If RCU is used properly, a
+tracing engine can use this to safely synchronize its own
+independent multiple threads of control with each other and with its
+event callbacks that detach.
+
+In the same vein, utrace_set_flags normally returns zero; if the
+target thread was quiescent before the call, then after a successful
+call, no event callbacks not requested in the new flags will be made,
+and a report_quiesce callback will always be made if requested. It
+fails with -EALREADY if you try to clear UTRACE_EVENT(DEATH) when the
+report_death callback may already have begun, if you try to clear
+UTRACE_EVENT(REAP) when the report_reap callback may already have
+begun, if you try to newly set UTRACE_ACTION_NOREAP when the target
+may already have sent its parent SIGCHLD, or if you try to newly set
+UTRACE_EVENT(DEATH), UTRACE_EVENT(QUIESCE), or UTRACE_ACTION_QUIESCE,
+when the target is already dead or dying. Like utrace_detach, it
+returns -ESRCH when the thread has already been detached (including
+forcible detach on reaping). This lets the tracing engine know for
+sure which event callbacks it will or won't see after utrace_set_flags
+has returned. By checking for errors, it can know whether to clean up
+its data structures immediately or to let its callbacks do the work.