2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

372 Commits

Author SHA1 Message Date
Steven Rostedt (Red Hat)
9b9db27505 tracing: Remove one use of trace_current_buffer_lock_reserve()
The only user of trace_current_buffer_lock_reserve() is in the boot up self
tests. Restructure the code a little to have that code use what everything
else uses: trace_event_buffer_lock_reserve().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29 18:10:21 -04:00
Tom Zanussi
7ef224d1d0 tracing: Add 'hist' event trigger command
'hist' triggers allow users to continually aggregate trace events,
which can then be viewed afterwards by simply reading a 'hist' file
containing the aggregation in a human-readable format.

The basic idea is very simple and boils down to a mechanism whereby
trace events, rather than being exhaustively dumped in raw form and
viewed directly, are automatically 'compressed' into meaningful tables
completely defined by the user.

This is done strictly via single-line command-line commands and
without the aid of any kind of programming language or interpreter.

A surprising number of typical use cases can be accomplished by users
via this simple mechanism.  In fact, a large number of the tasks that
users typically do using the more complicated script-based tracing
tools, at least during the initial stages of an investigation, can be
accomplished by simply specifying a set of keys and values to be used
in the creation of a hash table.

The Linux kernel trace event subsystem happens to provide an extensive
list of keys and values ready-made for such a purpose in the form of
the event format files associated with each trace event.  By simply
consulting the format file for field names of interest and by plugging
them into the hist trigger command, users can create an endless number
of useful aggregations to help with investigating various properties
of the system.  See Documentation/trace/events.txt for examples.

hist triggers are implemented on top of the existing event trigger
infrastructure, and as such are consistent with the existing triggers
from a user's perspective as well.

The basic syntax follows the existing trigger syntax.  Users start an
aggregation by writing a 'hist' trigger to the event of interest's
trigger file:

  # echo hist:keys=xxx [ if filter] > event/trigger

Once a hist trigger has been set up, by default it continually
aggregates every matching event into a hash table using the event key
and a value field named 'hitcount'.

To view the aggregation at any point in time, simply read the 'hist'
file in the same directory as the 'trigger' file:

  # cat event/hist

The detailed syntax provides additional options for user control, and
is described exhaustively in Documentation/trace/events.txt and in the
virtual tracing/README file in the tracing subsystem.

Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 12:16:14 -04:00
Steven Rostedt
c37775d578 tracing: Add infrastructure to allow set_event_pid to follow children
Add the infrastructure needed to have the PIDs in set_event_pid to
automatically add PIDs of the children of the tasks that have their PIDs in
set_event_pid. This will also remove PIDs from set_event_pid when a task
exits

This is implemented by adding hooks into the fork and exit tracepoints. On
fork, the PIDs are added to the list, and on exit, they are removed.

Add a new option called event_fork that when set, PIDs in set_event_pid will
automatically get their children PIDs added when they fork, as well as any
task that exits will have its PID removed from set_event_pid.

This works for instances as well.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:28 -04:00
Steven Rostedt
f4d34a87e9 tracing: Use pid bitmap instead of a pid array for set_event_pid
In order to add the ability to let tasks that are filtered by the events
have their children also be traced on fork (and then not traced on exit),
convert the array into a pid bitmask. Most of the time the number of pids is
only 32768 pids or a 4k bitmask, which is the same size as the default list
currently is, and that list could grow if more pids are listed.

This also greatly simplifies the code.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:27 -04:00
Steven Rostedt
9ebc57cfaa tracing: Rename check_ignore_pid() to ignore_this_task()
The name "check_ignore_pid" is confusing in trying to figure out if the pid
should be ignored or not. Rename it to "ignore_this_task" which is pretty
straight forward, as a task (not a pid) is passed in, and should if true
should be ignored.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-19 10:28:26 -04:00
Alexei Starovoitov
32bbe0078a bpf: sanitize bpf tracepoint access
during bpf program loading remember the last byte of ctx access
and at the time of attaching the program to tracepoint check that
the program doesn't access bytes beyond defined in tracepoint fields

This also disallows access to __dynamic_array fields, but can be
relaxed in the future.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 21:04:26 -04:00
Steven Rostedt (Red Hat)
e57cbaf0eb tracing: Do not have 'comm' filter override event 'comm' field
Commit 9f61668073 "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.

 echo 'comm == "bash"' > events/sched_migrate_task/filter

will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.

This fix requires a couple of changes.

1) Change the look up order for filter predicates to look at the events
   fields before looking at the generic filters.

2) Instead of basing the filter function off of the "comm" name, have the
   generic "comm" filter have its own filter_type (FILTER_COMM). Test
   against the type instead of the name to assign the filter function.

3) Add a new "COMM" filter that works just like "comm" but will filter based
   on the current task, even if the trace event contains a "comm" field.

Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f61668073 "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-03-04 09:57:10 -05:00
Steven Rostedt (Red Hat)
d045437a16 tracing: Fix showing function event in available_events
The ftrace:function event is only displayed for parsing the function tracer
data. It is not used to enable function tracing, and does not include an
"enable" file in its event directory.

Originally, this event was kept separate from other events because it did
not have a ->reg parameter. But perf added a "reg" parameter for its use
which caused issues, because it made the event available to functions where
it was not compatible for.

Commit 9b63776fa3 "tracing: Do not enable function event with enable"
added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
from being enabled by normal trace events. But this commit missed keeping
the function event from being displayed by the "available_events" directory,
which is used to show what events can be enabled by set_event.

One documented way to enable all events is to:

 cat available_events > set_event

But because the function event is displayed in the available_events, this
now causes an INVALID error:

 cat: write error: Invalid argument

Reported-by: Chunyu Hu <chuhu@redhat.com>
Fixes: 9b63776fa3 "tracing: Do not enable function event with enable"
Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-02-24 09:17:11 -05:00
Al Viro
70f6cbb6f9 kernel/*: switch to memdup_user_nul()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:27:55 -05:00
Steven Rostedt (Red Hat)
0f72e37e42 tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
The set_event_pid filter relies on attaching to the sched_switch and
sched_wakeup tracepoints to see if it should filter the tracing on schedule
tracepoints. By adding the callbacks to sched_wakeup, pids in the
set_event_pid file will trace the wakeups of those tasks with those pids.

But sched_wakeup_new and sched_waking were missed. These two should also be
traced. Luckily, these tracepoints share the same class as sched_wakeup
which means they can use the same pre and post callbacks as sched_wakeup
does.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-12-01 16:08:05 -05:00
Linus Torvalds
22402cd0af Most of the changes are clean ups and small fixes. Some of them have
stable tags to them. I searched through my INBOX just as the merge window
 opened and found lots of patches to pull. I ran them through all my tests
 and they were in linux-next for a few days.
 
 Features added this release:
 ----------------------------
 
  o Module globbing. You can now filter function tracing to several
    modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)
 
  o Tracer specific options are now visible even when the tracer is not
    active. It was rather annoying that you can only see and modify tracer
    options after enabling the tracer. Now they are in the options/ directory
    even when the tracer is not active. Although they are still only visible
    when the tracer is active in the trace_options file.
 
  o Trace options are now per instance (although some of the tracer specific
    options are global)
 
  o New tracefs file: set_event_pid. If any pid is added to this file, then
    all events in the instance will filter out events that are not part of
    this pid. sched_switch and sched_wakeup events handle next and the wakee
    pids.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWPLQ5AAoJEKKk/i67LK/8CTYIAI1u8DE5QCzv3J0p54jVpNVR
 J5FqEU3eXIzd6FS4JXD4nxCeMpUZAy21YnhlZpsnrbJJM5bc9bUsBCwiKKM+MuSZ
 ztmy2sgYKkO0h/KUdhNgYJrzis3/Ojquyx9iAqK5ST/Fr+nKYx81akFKjNK53iur
 RJRut45sSa8rv11LaL8sgJ6hAWQTc+YkybUdZ5xaMdJmZ6A61T7Y6VzTjbUexuvL
 hntCfTjYLtVd8dbfknAnf3B7n/VOO3IFF85wr7ciYR5oEVfPrF8tHmJBlhHExPpX
 kaXAiDDRY/UTg/5DQqnp4zmxJoR5BQ2l4pT5PwiLcnwhcphIDNYS8EYUmOYAWjU=
 =TjOE
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracking updates from Steven Rostedt:
 "Most of the changes are clean ups and small fixes.  Some of them have
  stable tags to them.  I searched through my INBOX just as the merge
  window opened and found lots of patches to pull.  I ran them through
  all my tests and they were in linux-next for a few days.

  Features added this release:
  ----------------------------

   - Module globbing.  You can now filter function tracing to several
     modules.  # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)

   - Tracer specific options are now visible even when the tracer is not
     active.  It was rather annoying that you can only see and modify
     tracer options after enabling the tracer.  Now they are in the
     options/ directory even when the tracer is not active.  Although
     they are still only visible when the tracer is active in the
     trace_options file.

   - Trace options are now per instance (although some of the tracer
     specific options are global)

   - New tracefs file: set_event_pid.  If any pid is added to this file,
     then all events in the instance will filter out events that are not
     part of this pid.  sched_switch and sched_wakeup events handle next
     and the wakee pids"

* tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
  tracefs: Fix refcount imbalance in start_creating()
  tracing: Put back comma for empty fields in boot string parsing
  tracing: Apply tracer specific options from kernel command line.
  tracing: Add some documentation about set_event_pid
  ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
  tracing: Allow dumping traces without tracking trace started cpus
  ring_buffer: Fix more races when terminating the producer in the benchmark
  ring_buffer: Do no not complete benchmark reader too early
  tracing: Remove redundant TP_ARGS redefining
  tracing: Rename max_stack_lock to stack_trace_max_lock
  tracing: Allow arch-specific stack tracer
  recordmcount: arm64: Replace the ignored mcount call into nop
  recordmcount: Fix endianness handling bug for nop_mcount
  tracepoints: Fix documentation of RCU lockdep checks
  tracing: ftrace_event_is_function() can return boolean
  tracing: is_legal_op() can return boolean
  ring-buffer: rb_event_is_commit() can return boolean
  ring-buffer: rb_per_cpu_empty() can return boolean
  ring_buffer: ring_buffer_empty{cpu}() can return boolean
  ring-buffer: rb_is_reader_page() can return boolean
  ...
2015-11-06 13:30:20 -08:00
Steven Rostedt (Red Hat)
43ed384339 tracing: Put back comma for empty fields in boot string parsing
Both early_enable_events() and apply_trace_boot_options() parse a boot
string that may get parsed later on. They both use strsep() which converts a
comma into a nul character. To still allow the boot string to be parsed
again the same way, the nul character gets converted back to a comma after
the token is processed.

The problem is that these two functions check for an empty parameter (two
commas in a row ",,"), and continue the loop if the parameter is empty, but
fails to place the comma back. In this case, the second parsing will end at
this blank field, and not process fields afterward.

In most cases, users should not have an empty field, but if its going to be
checked, the code might as well be correct.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-03 22:15:14 -05:00
Steven Rostedt (Red Hat)
799fd44cf5 tracing: Call on_each_cpu() when adding or removing single pids from set_event_pid
For the case where pids are already in set_event_pid, and one is added or
removed then each CPU should be checked to make sure that the new or old pid
is on or not on a CPU.

 For example:

 # echo 123 >> set_event_pid

or

 # echo '!123' >> set_event_pid

Link: http://lkml.kernel.org/r/20151030061643.GA19480@cac

Suggested-by: Jiaxing Wang <hello.wjx@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 13:08:26 -05:00
Steven Rostedt (Red Hat)
fb66228828 tracing: Fix sparse RCU warning
p_start() and p_stop() are seq_file functions that match. Teach sparse to
know that rcu_read_lock_sched() that is taken by p_start() is released by
p_stop.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-26 03:51:32 -04:00
Steven Rostedt (Red Hat)
8ca532ad2b tracing: Check all tasks on each CPU when filtering pids
My tests found that if a task is running but not filtered when set_event_pid
is modified, then it can still be traced.

Call on_each_cpu() to check if the current running task should be filtered
and update the per cpu flags of tr->data appropriately.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25 21:33:56 -04:00
Steven Rostedt (Red Hat)
3fdaf80f4a tracing: Implement event pid filtering
Add the necessary hooks to use the pids loaded in set_event_pid to filter
all the events enabled in the tracing instance that match the pids listed.

Two probes are added to both sched_switch and sched_wakeup tracepoints to be
called before other probes are called and after the other probes are called.
The first is used to set the necessary flags to let the probes know to test
if they should be traced or not.

The sched_switch pre probe will set the "ignore_pid" flag if neither the
previous or next task has a matching pid.

The sched_switch probe will set the "ignore_pid" flag if the next task
does not match the matching pid.

The pre probe allows for probes tracing sched_switch to be traced if
necessary.

The sched_wakeup pre probe will set the "ignore_pid" flag if neither the
current task nor the wakee task has a matching pid.

The sched_wakeup post probe will set the "ignore_pid" flag if the current
task does not have a matching pid.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25 21:33:56 -04:00
Steven Rostedt (Red Hat)
4909010788 tracing: Add set_event_pid directory for future use
Create a tracing directory called set_event_pid, which currently has no
function, but will be used to filter all events for the tracing instance or
the pids that are added to the file.

The reason no functionality is added with this commit is that this commit
focuses on the creation and removal of the pids in a safe manner. And tests
can be made against this change to make sure things are correct before
hooking features to the list of pids.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-25 21:33:55 -04:00
Rasmus Villemoes
79ac6ef521 tracing: Use kstrdup_const instead of private implementation
The kernel now has kstrdup_const/kfree_const for reusing .rodata
(typically string literals) when possible; there's no reason to
duplicate that logic in the tracing system. Moreover, as the comment
above core_kernel_data states, it may not always return true for
.rodata - that is for example the case on x86_64, where we thus end up
kstrdup'ing all the passed-in strings.

Arguably, testing for .rodata explicitly (as kstrdup_const does) is
also more correct: I don't think one is supposed to be able to change
the name after creating the event_subsystem by passing the address of
a static char (but non-const) array.

Link: http://lkml.kernel.org/r/1441833841-12955-1-git-send-email-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-10-01 09:05:17 -04:00
Steven Rostedt (Red Hat)
2d34f48955 tracing: Make ftrace_trace_stack() depend on general trace_array flag
In preparation for the multi buffer instances to have their own trace_flags,
the check in ftrace_trace_stack() needs to test the trace_array descriptor
flag that is for the current event, not the global_trace descriptor.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-30 15:22:57 -04:00
Steven Rostedt (Red Hat)
983f938ae6 tracing: Move trace_flags from global to a trace_array field
In preparation to make trace options per instance, the global trace_flags
needs to be moved from being a global variable to a field within the trace
instance trace_array structure.

There's still more work to do, as there's some functions that use
trace_flags without passing in a way to get to the current_trace array. For
those, the global_trace is used directly (from trace.c). This includes
setting and clearing the trace_flags. This means that when a new instance is
created, it just gets the trace_flags of the global_trace and will not be
able to modify them. Depending on the functions that have access to the
trace_array, the flags of an instance may not affect parts of its trace,
where the global_trace is used. These will be fixed in future changes.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-30 15:22:55 -04:00
Steven Rostedt (Red Hat)
b7f0c959ed tracing: Pass trace_array into trace_buffer_unlock_commit()
In preparation for having trace options be per instance, the trace_array
needs to be passed to the trace_buffer_unlock_commit(). The
trace_event_buffer_lock_reserve() already passes in the trace_event_file
where the trace_array can be derived from.

Also added a "__init" to the boot up test event plus function tracing
function function_test_events_call().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-25 17:38:44 -04:00
Daniel Wagner
9f61668073 tracing: Allow triggers to filter for CPU ids and process names
By extending the filter rules by more generic fields
we can write triggers filters like

  echo 'stacktrace if cpu == 1' > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

or

  echo 'stacktrace if comm == sshd'  > \
	/sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger

CPU and COMM are not part of struct trace_entry. We could add the two
new fields to ftrace_common_field list and fix up all depending
sides. But that looks pretty ugly. Another thing I would like to
avoid that the 'format' file contents changes.

All this can be avoided by introducing another list which contains
non field members of struct trace_entry.

Link: http://lkml.kernel.org/r/1439210146-24707-1-git-send-email-daniel.wagner@bmw-carit.de

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-08-11 18:01:06 -04:00
Steven Rostedt (Red Hat)
5d6ad960a7 tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The FTRACE_EVENT_FL_* flags are flags to
do with the trace_event files in the tracefs directory. They are not related
to function tracing. Rename them to a more descriptive name.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 15:24:57 -04:00
Steven Rostedt (Red Hat)
7967b3e0c4 tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_subsystem_dir holds
the information about trace event subsystems. It should not be named
ftrace, rename it to trace_subsystem_dir.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:59:40 -04:00
Steven Rostedt (Red Hat)
687fcc4aee tracing: Rename ftrace_event_name() to trace_event_name()
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. ftrace_event_name() returns the name of
an event tracepoint, has nothing to do with function tracing. Rename it
to trace_event_name().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:20:14 -04:00
Steven Rostedt (Red Hat)
3f795dcfc7 tracing: Rename ftrace_event_buffer to trace_event_buffer.
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_event_buffer functions and data
structures are for trace_events and not for function hooks. Rename them
to trace_event_buffer*.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:06:36 -04:00
Steven Rostedt (Red Hat)
2425bcb924 tracing: Rename ftrace_event_{call,class} to trace_event_{call,class}
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:06:10 -04:00
Steven Rostedt (Red Hat)
7f1d2f8210 tracing: Rename ftrace_event_file to trace_event_file
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_file is really
about trace events and not "ftrace". Rename it to trace_event_file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:05:16 -04:00
Steven Rostedt (Red Hat)
9023c93090 tracing: Rename (un)register_ftrace_event() to (un)register_trace_event()
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The functions (un)register_ftrace_event() is
really about trace_events, and the name should be register_trace_event()
instead.

Also renamed ftrace_event_reg() to trace_event_reg() for the same reason.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-13 14:05:14 -04:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Linus Torvalds
4f2112351b This adds three fixes for the tracing code.
The first is a bug when ftrace_dump_on_oops is triggered in atomic context
 and function graph tracer is the tracer that is being reported.
 
 The second fix is bad parsing of the trace_events from the kernel
 command line, where it would ignore specific events if the system
 name is used with defining the event(it enables all events within the
 system).
 
 The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was missing
 to see if the ptr was incremented to the end of the string, but the loop
 increments it again and can miss the nul delimiter to stop processing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVN7g7AAoJEEjnJuOKh9ldLCkIAJj6wsj59wR8aIxtxnD5bJZZ
 Y9dKkas6BOaUCGp0MVBCkpm3uMgh0uPO10jOuthgc3LgQy0piwKJrIzGkI5gcRuA
 JvZw2X08/jCSNu8BHydes2A0XMkZobMuWFAeWz3CSzNBfbI3sDDqgnRQ9eyMM66R
 +sPV7ZELQLK/ZFs93gFoaZe/OKGYJcamhMtG2v7p9h1qBJZUDakRFo478bAwu5SB
 Kh6xpXA76WnmkeQekAAsWGdqIQzoNy3IoVePmdhVpZvoiKLUrf1JWVYFoNkO39Ik
 kC1gqJro0EmHQFOo5rt8yQmqXjvSU0sS/sAW3HZ0Szl5OtiUNnag+Wt3ku7lr8U=
 =VORa
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This adds three fixes for the tracing code.

  The first is a bug when ftrace_dump_on_oops is triggered in atomic
  context and function graph tracer is the tracer that is being
  reported.

  The second fix is bad parsing of the trace_events from the kernel
  command line, where it would ignore specific events if the system name
  is used with defining the event(it enables all events within the
  system).

  The last one is a fix to the TRACE_DEFINE_ENUM(), where a check was
  missing to see if the ptr was incremented to the end of the string,
  but the loop increments it again and can miss the nul delimiter to
  stop processing"

* tag 'trace-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix possible out of bounds memory access when parsing enums
  tracing: Fix incorrect enabling of trace events by boot cmdline
  tracing: Handle ftrace_dump() atomic context in graph_trace_open()
2015-04-22 11:27:36 -07:00
Steven Rostedt (Red Hat)
3193899d4d tracing: Fix possible out of bounds memory access when parsing enums
The code that replaces the enum names with the enum values in the
tracepoints' format files could possible miss the end of string nul
character. This was caused by processing things like backslashes, quotes
and other tokens. After processing the tokens, a check for the nul
character needed to be done before continuing the loop, because the loop
incremented the pointer before doing the check, which could bypass the nul
character.

Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com

Reported-by: Sasha Levin <sasha.levin@oracle.com> # via KASan
Tested-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Fixes: 0c564a538a "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-17 10:34:43 -04:00
Joonsoo Kim
84fce9db4d tracing: Fix incorrect enabling of trace events by boot cmdline
There is a problem that trace events are not properly enabled with
boot cmdline. The problem is that if we pass "trace_event=kmem:mm_page_alloc"
to the boot cmdline, it enables all kmem trace events, and not just
the page_alloc event.

This is caused by the parsing mechanism. When we parse the cmdline, the buffer
contents is modified due to tokenization. And, if we use this buffer
again, we will get the wrong result.

Unfortunately, this buffer is be accessed three times to set trace events
properly at boot time. So, we need to handle this situation.

There is already code handling ",", but we need another for ":".
This patch adds it.

Link: http://lkml.kernel.org/r/1429159484-22977-1-git-send-email-iamjoonsoo.kim@lge.com

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[ added missing return ret; ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-16 09:44:07 -04:00
David Howells
7682c91843 VFS: kernel/: d_inode() annotations
relayfs and tracefs are dealing with inodes of their own;
those two act as filesystem drivers

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:55 -04:00
Linus Torvalds
eeee78cf77 Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.
 
 Tracepoints have helper functions for the TP_printk() called
 __print_symbolic() and __print_flags() that lets a numeric number be
 displayed as a a human comprehensible text. What is placed in the
 TP_printk() is also shown in the tracepoint format file such that
 user space tools like perf and trace-cmd can parse the binary data
 and express the values too. Unfortunately, the way the TRACE_EVENT()
 macro works, anything placed in the TP_printk() will be shown pretty
 much exactly as is. The problem arises when enums are used. That's
 because unlike macros, enums will not be changed into their values
 by the C pre-processor. Thus, the enum string is exported to the
 format file, and this makes it useless for user space tools.
 
 The TRACE_DEFINE_ENUM() solves this by converting the enum strings
 in the TP_printk() format into their number, and that is what is
 shown to user space. For example, the tracepoint tlb_flush currently
 has this in its format file:
 
      __print_symbolic(REC->reason,
         { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
         { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
         { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
         { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })
 
 After adding:
 
      TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
      TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
      TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);
 
 Its format file will contain this:
 
      __print_symbolic(REC->reason,
         { 0, "flush on task switch" },
         { 1, "remote shootdown" },
         { 2, "local shootdown" },
         { 3, "local mm shootdown" })
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR
 wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr
 y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM
 dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+
 aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu
 JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk=
 =Ixwt
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Some clean ups and small fixes, but the biggest change is the addition
  of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints.

  Tracepoints have helper functions for the TP_printk() called
  __print_symbolic() and __print_flags() that lets a numeric number be
  displayed as a a human comprehensible text.  What is placed in the
  TP_printk() is also shown in the tracepoint format file such that user
  space tools like perf and trace-cmd can parse the binary data and
  express the values too.  Unfortunately, the way the TRACE_EVENT()
  macro works, anything placed in the TP_printk() will be shown pretty
  much exactly as is.  The problem arises when enums are used.  That's
  because unlike macros, enums will not be changed into their values by
  the C pre-processor.  Thus, the enum string is exported to the format
  file, and this makes it useless for user space tools.

  The TRACE_DEFINE_ENUM() solves this by converting the enum strings in
  the TP_printk() format into their number, and that is what is shown to
  user space.  For example, the tracepoint tlb_flush currently has this
  in its format file:

     __print_symbolic(REC->reason,
        { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
        { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
        { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
        { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

  After adding:

     TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
     TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
     TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

  Its format file will contain this:

     __print_symbolic(REC->reason,
        { 0, "flush on task switch" },
        { 1, "remote shootdown" },
        { 2, "local shootdown" },
        { 3, "local mm shootdown" })"

* tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
  tracing: Add enum_map file to show enums that have been mapped
  writeback: Export enums used by tracepoint to user space
  v4l: Export enums used by tracepoints to user space
  SUNRPC: Export enums in tracepoints to user space
  mm: tracing: Export enums in tracepoints to user space
  irq/tracing: Export enums in tracepoints to user space
  f2fs: Export the enums in the tracepoints to userspace
  net/9p/tracing: Export enums in tracepoints to userspace
  x86/tlb/trace: Export enums in used by tlb_flush tracepoint
  tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM()
  tracing: Allow for modules to convert their enums to values
  tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
  tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation
  tracing: Give system name a pointer
  brcmsmac: Move each system tracepoints to their own header
  iwlwifi: Move each system tracepoints to their own header
  mac80211: Move message tracepoints to their own header
  tracing: Add TRACE_SYSTEM_VAR to xhci-hcd
  tracing: Add TRACE_SYSTEM_VAR to kvm-s390
  tracing: Add TRACE_SYSTEM_VAR to intel-sst
  ...
2015-04-14 10:49:03 -07:00
Steven Rostedt (Red Hat)
3673b8e4ce tracing: Allow for modules to convert their enums to values
Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
will have those enums converted into their values in the tracepoint
print fmt strings.

Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.au

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08 09:39:57 -04:00
Steven Rostedt (Red Hat)
0c564a538a tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.

For example, the tracepoint trace_tlb_flush() has:

 __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.

With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:

By adding:

 TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
 TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

 $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
 __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })

The above is what userspace expects to see, and tools do not need to
be modified to parse them.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-08 09:39:56 -04:00
Linus Torvalds
41cbc01f6e The updates included in this pull request for ftrace are:
o Several clean ups to the code
 
    One such clean up was to convert to 64 bit time keeping, in the
    ring buffer benchmark code.
 
  o Adding of __print_array() helper macro for TRACE_EVENT()
 
  o Updating the sample/trace_events/ to add samples of different ways to
    make trace events. Lots of features have been added since the sample
    code was made, and these features are mostly unknown. Developers
    have been making their own hacks to do things that are already available.
 
  o Performance improvements. Most notably, I found a performance bug where
    a waiter that is waiting for a full page from the ring buffer will
    see that a full page is not available, and go to sleep. The sched
    event caused by it going to sleep would cause it to wake up again.
    It would see that there was still not a full page, and go back to sleep
    again, and that would wake it up again, until finally it would see a
    full page. This change has been marked for stable.
 
    Other improvements include removing global locks from fast paths.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e
 XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w
 +qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm
 iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3
 BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9
 v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA=
 =u2Mm
 -----END PGP SIGNATURE-----

Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The updates included in this pull request for ftrace are:

   o Several clean ups to the code

     One such clean up was to convert to 64 bit time keeping, in the
     ring buffer benchmark code.

   o Adding of __print_array() helper macro for TRACE_EVENT()

   o Updating the sample/trace_events/ to add samples of different ways
     to make trace events.  Lots of features have been added since the
     sample code was made, and these features are mostly unknown.
     Developers have been making their own hacks to do things that are
     already available.

   o Performance improvements.  Most notably, I found a performance bug
     where a waiter that is waiting for a full page from the ring buffer
     will see that a full page is not available, and go to sleep.  The
     sched event caused by it going to sleep would cause it to wake up
     again.  It would see that there was still not a full page, and go
     back to sleep again, and that would wake it up again, until finally
     it would see a full page.  This change has been marked for stable.

  Other improvements include removing global locks from fast paths"

* tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Do not wake up a splice waiter when page is not full
  tracing: Fix unmapping loop in tracing_mark_write
  tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT()
  tracing: Add TRACE_EVENT_FN example
  tracing: Add TRACE_EVENT_CONDITION sample
  tracing: Update the TRACE_EVENT fields available in the sample code
  tracing: Separate out initializing top level dir from instances
  tracing: Make tracing_init_dentry_tr() static
  trace: Use 64-bit timekeeping
  tracing: Add array printing helper
  tracing: Remove newline from trace_printk warning banner
  tracing: Use IS_ERR() check for return value of tracing_init_dentry()
  tracing: Remove unneeded includes of debugfs.h and fs.h
  tracing: Remove taking of trace_types_lock in pipe files
  tracing: Add ref count to tracer for when they are being read by pipe
2015-02-12 08:37:41 -08:00
Steven Rostedt (Red Hat)
8434dc9340 tracing: Convert the tracing facility over to use tracefs
debugfs was fine for the tracing facility as a quick way to get
an interface. Now that tracing has matured, it should separate itself
from debugfs such that it can be mounted separately without needing
to mount all of debugfs with it. That is, users resist using tracing
because it requires mounting debugfs. Having tracing have its own file
system lets users get the features of tracing without needing to bring
in the rest of the kernel's debug infrastructure.

Another reason for tracefs is that debubfs does not support mkdir.
Currently, to create instances, one does a mkdir in the tracing/instance
directory. This is implemented via a hack that forces debugfs to do
something it is not intended on doing. By converting over to tracefs, this
hack can be removed and mkdir can be properly implemented. This patch does
not address this yet, but it lays the ground work for that to be done.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-02-03 12:48:41 -05:00
Steven Rostedt (Red Hat)
dfbc1534ea Merge branch 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into trace/ftrace/tracefs
Pull in Al Viro's changes to debugfs that implement the new primitive:
debugfs_create_automount(), that creates a directory in debugfs that will
safely mount another file system automatically when debugfs is mounted.

This will let tracefs automount itself on top of debugfs/tracing directory.
2015-02-02 11:47:31 -05:00
Steven Rostedt (Red Hat)
14a5ae40f0 tracing: Use IS_ERR() check for return value of tracing_init_dentry()
tracing_init_dentry() will soon return NULL as a valid pointer for the
top level tracing directroy. NULL can not be used as an error value.
Instead, switch to ERR_PTR() and check the return status with
IS_ERR().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-22 11:19:49 -05:00
Steven Rostedt (Red Hat)
ce1039bd3a tracing: Fix enabling of syscall events on the command line
Commit 5f893b2639 "tracing: Move enabling tracepoints to just after
rcu_init()" broke the enabling of system call events from the command
line. The reason was that the enabling of command line trace events
was moved before PID 1 started, and the syscall tracepoints require
that all tasks have the TIF_SYSCALL_TRACEPOINT flag set. But the
swapper task (pid 0) is not part of that. Since the swapper task is the
only task that is running at this early in boot, no task gets the
flag set, and the tracepoint never gets reached.

Instead of setting the swapper task flag (there should be no reason to
do that), re-enabled trace events again after the init thread (PID 1)
has been started. It requires disabling all command line events and
re-enabling them, as just enabling them again will not reset the logic
to set the TIF_SYSCALL_TRACEPOINT flag, as the syscall tracepoint will
be fooled into thinking that it was already set, and wont try setting
it again. For this reason, we must first disable it and re-enable it.

Link: http://lkml.kernel.org/r/1421188517-18312-1-git-send-email-mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/20150115040506.216066449@goodmis.org

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-01-15 09:42:50 -05:00
Linus Torvalds
a7c180aa7e As the merge window is still open, and this code was not as complex
as I thought it might be. I'm pushing this in now.
 
 This will allow Thomas to debug his irq work for 3.20.
 
 This adds two new features:
 
 1) Allow traceopoints to be enabled right after mm_init(). By passing
 in the trace_event= kernel command line parameter, tracepoints can be
 enabled at boot up. For debugging things like the initialization of
 interrupts, it is needed to have tracepoints enabled very early. People
 have asked about this before and this has been on my todo list. As it
 can be helpful for Thomas to debug his upcoming 3.20 IRQ work, I'm
 pushing this now. This way he can add tracepoints into the IRQ set up
 and have users enable them when things go wrong.
 
 2) Have the tracepoints printed via printk() (the console) when they
 are triggered. If the irq code locks up or reboots the box, having the
 tracepoint output go into the kernel ring buffer is useless for
 debugging. But being able to add the tp_printk kernel command line
 option along with the trace_event= option will have these tracepoints
 printed as they occur, and that can be really useful for debugging
 early lock up or reboot problems.
 
 This code is not that intrusive and it passed all my tests. Thomas tried
 them out too and it works for his needs.
 
  Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUjv3kAAoJEEjnJuOKh9ldLNsIANAe5EmDCBw0WjR72n+G3qOH
 NC8calXfkjqHU0bv8Q3dRv20KH4MHOy6l4+EiV9/ovt71LOF3NEyUJ3HuShf9a8b
 sWcUhYbX3D1hViQe5sOzv9AWhBCFlKQGoNmQnydX9xa8ivRsBaTGJIGktWlHcwBE
 jF1i3fj3l3vRQSS8qZFXp3bzreunlGyPoSHcT6eWQeos+utj4sKwQWTLXTLQeM+6
 oQtFKRx7E5yX04qO1qFczS8qIEC6JH2C2jIRYEKUGepaELlnGkb8O7jQV/RaLF4/
 6P8VhZFG9YLS7fn7vWu0SnAN+Zwz5LzgjXAZt0FhGtIhLc18Oj8ouHH1UORsdQM=
 =Z4Un
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "As the merge window is still open, and this code was not as complex as
  I thought it might be.  I'm pushing this in now.

  This will allow Thomas to debug his irq work for 3.20.

  This adds two new features:

  1) Allow traceopoints to be enabled right after mm_init().

     By passing in the trace_event= kernel command line parameter,
     tracepoints can be enabled at boot up.  For debugging things like
     the initialization of interrupts, it is needed to have tracepoints
     enabled very early.  People have asked about this before and this
     has been on my todo list.  As it can be helpful for Thomas to debug
     his upcoming 3.20 IRQ work, I'm pushing this now.  This way he can
     add tracepoints into the IRQ set up and have users enable them when
     things go wrong.

  2) Have the tracepoints printed via printk() (the console) when they
     are triggered.

     If the irq code locks up or reboots the box, having the tracepoint
     output go into the kernel ring buffer is useless for debugging.
     But being able to add the tp_printk kernel command line option
     along with the trace_event= option will have these tracepoints
     printed as they occur, and that can be really useful for debugging
     early lock up or reboot problems.

  This code is not that intrusive and it passed all my tests.  Thomas
  tried them out too and it works for his needs.

   Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"

* tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Add tp_printk cmdline to have tracepoints go to printk()
  tracing: Move enabling tracepoints to just after rcu_init()
2014-12-16 12:53:59 -08:00
Steven Rostedt (Red Hat)
0daa230296 tracing: Add tp_printk cmdline to have tracepoints go to printk()
Add the kernel command line tp_printk option that will have tracepoints
that are active sent to printk() as well as to the trace buffer.

Passing "tp_printk" will activate this. To turn it off, the sysctl
/proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note,
this only works if the cmdline option is used. Echoing 1 into the sysctl
file without the cmdline option will have no affect.

Note, this is a dangerous option. Having high frequency tracepoints send
their data to printk() can possibly cause a live lock. This is another
reason why this is only active if the command line option is used.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15 10:17:38 -05:00
Steven Rostedt (Red Hat)
5f893b2639 tracing: Move enabling tracepoints to just after rcu_init()
Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after RCU has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after rcu_init().

This also allows trace_printk() to be used early too.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15 10:16:50 -05:00
Linus Torvalds
350e4f4985 This code is a fork from the trace-3.19 pull as it needed the trace_seq
clean ups from that branch.
 
 This code solves the issue of performing stack dumps from NMI context.
 The issue is that printk() is not safe from NMI context as if the NMI
 were to trigger when a printk() was being performed, the NMI could
 deadlock from the printk() internal locks. This has been seen in practice.
 
 With lots of review from Petr Mladek, this code went through several
 iterations, and we feel that it is now at a point of quality to be
 accepted into mainline.
 
 Here's what is contained in this patch set:
 
  o Creates a "seq_buf" generic buffer utility that allows a descriptor
    to be passed around where functions can write their own "printk()"
    formatted strings into it. The generic version was pulled out of
    the trace_seq() code that was made specifically for tracing.
 
  o The seq_buf code was change to model the seq_file code. I have
    a patch (not included for 3.19) that converts the seq_file.c code
    over to use seq_buf.c like the trace_seq.c code does. This was done
    to make sure that seq_buf.c is compatible with seq_file.c. I may
    try to get that patch in for 3.20.
 
  o The seq_buf.c file was moved to lib/ to remove it from being dependent
    on CONFIG_TRACING.
 
  o The printk() was updated to allow for a per_cpu "override" of
    the internal calls. That is, instead of writing to the console, a call
    to printk() may do something else. This made it easier to allow the
    NMI to change what printk() does in order to call dump_stack() without
    needing to update that code as well.
 
  o Finally, the dump_stack from all CPUs via NMI code was converted to
    use the seq_buf code. The caller to trigger the NMI code would wait
    till all the NMIs finished, and then it would print the seq_buf
    data to the console safely from a non NMI context.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbrnAAoJEEjnJuOKh9ldsCoIAJ3sKIJ5B3jxJJTCHPAx/lZD
 GVbV1J1mu4kTAZuhJZOAxW8D6PZGZMyEjg0y6ScDEnBGcjAZ9gTiWCdakPktf9EX
 GfaPPqwiL9dZ18J9Qc6uR+7M1Ffpzzwbcc6lJrpoTcjRgkoH9wCiLS9ozFQyYzWb
 /7m5UbUM/PIk9WAjLYXPW6UUVtPTPT0RdEQKofMGTeah+vgqj4TXCOROdlxsXXWF
 77vqBvPd5TUPWFH9ftzJGDtZS8SroXVKCu3fZIqHgzAU0yqwVtH/JzDTy9u2UYhX
 GzDEPeAIdp6m6Uyc406VuIf1QW0gfBgmA0ir80vFoP27uFMM6j5HlF7azgQfx34=
 =YBgA
 -----END PGP SIGNATURE-----

Merge tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull nmi-safe seq_buf printk update from Steven Rostedt:
 "This code is a fork from the trace-3.19 pull as it needed the
  trace_seq clean ups from that branch.

  This code solves the issue of performing stack dumps from NMI context.
  The issue is that printk() is not safe from NMI context as if the NMI
  were to trigger when a printk() was being performed, the NMI could
  deadlock from the printk() internal locks.  This has been seen in
  practice.

  With lots of review from Petr Mladek, this code went through several
  iterations, and we feel that it is now at a point of quality to be
  accepted into mainline.

  Here's what is contained in this patch set:

   - Creates a "seq_buf" generic buffer utility that allows a descriptor
     to be passed around where functions can write their own "printk()"
     formatted strings into it.  The generic version was pulled out of
     the trace_seq() code that was made specifically for tracing.

   - The seq_buf code was change to model the seq_file code.  I have a
     patch (not included for 3.19) that converts the seq_file.c code
     over to use seq_buf.c like the trace_seq.c code does.  This was
     done to make sure that seq_buf.c is compatible with seq_file.c.  I
     may try to get that patch in for 3.20.

   - The seq_buf.c file was moved to lib/ to remove it from being
     dependent on CONFIG_TRACING.

   - The printk() was updated to allow for a per_cpu "override" of the
     internal calls.  That is, instead of writing to the console, a call
     to printk() may do something else.  This made it easier to allow
     the NMI to change what printk() does in order to call dump_stack()
     without needing to update that code as well.

   - Finally, the dump_stack from all CPUs via NMI code was converted to
     use the seq_buf code.  The caller to trigger the NMI code would
     wait till all the NMIs finished, and then it would print the
     seq_buf data to the console safely from a non NMI context

  One added bonus is that this code also makes the NMI dump stack work
  on PREEMPT_RT kernels.  As printk() includes sleeping locks on
  PREEMPT_RT, printk() only writes to console if the console does not
  use any rt_mutex converted spin locks.  Which a lot do"

* tag 'trace-seq-buf-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/nmi: Fix use of unallocated cpumask_var_t
  printk/percpu: Define printk_func when printk is not defined
  x86/nmi: Perform a safe NMI stack trace on all CPUs
  printk: Add per_cpu printk func to allow printk to be diverted
  seq_buf: Move the seq_buf code to lib/
  seq-buf: Make seq_buf_bprintf() conditional on CONFIG_BINARY_PRINTF
  tracing: Add seq_buf_get_buf() and seq_buf_commit() helper functions
  tracing: Have seq_buf use full buffer
  seq_buf: Add seq_buf_can_fit() helper function
  tracing: Add paranoid size check in trace_printk_seq()
  tracing: Use trace_seq_used() and seq_buf_used() instead of len
  tracing: Clean up tracing_fill_pipe_page()
  seq_buf: Create seq_buf_used() to find out how much was written
  tracing: Add a seq_buf_clear() helper and clear len and readpos in init
  tracing: Convert seq_buf fields to be like seq_file fields
  tracing: Convert seq_buf_path() to be like seq_path()
  tracing: Create seq_buf layer in trace_seq
2014-12-10 20:35:41 -08:00
Linus Torvalds
1dd7dcb6ea There was a lot of clean ups and minor fixes. One of those clean ups was
to the trace_seq code. It also removed the return values to the
 trace_seq_*() functions and use trace_seq_has_overflowed() to see if
 the buffer filled up or not. This is similar to work being done to the
 seq_file code as well in another tree.
 
 Some of the other goodies include:
 
  o Added some "!" (NOT) logic to the tracing filter.
 
  o Fixed the frame pointer logic to the x86_64 mcount trampolines
 
  o Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
    That is, the ftrace trampoline can be dynamically allocated
    and be called directly by functions that only have a single hook
    to them.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhbLGAAoJEEjnJuOKh9ldRV4H/3NcLbgGB2iu96la1zdYE6pG
 Q7cDJMxXK80YIIL70h9G0IItcD4t62LMb72lfBnMGRj3msgFb3AgISW57EuI0Pxk
 xk24wuIPoTG2S7v9sc3SboNFwO8qbtIjxD2OBmqIUrGo2sZIiGjyj3gX7mCY3uzL
 WB2bUOSFz/22OgaANinR5EELHA3pZZCf54Vz1K9ndmtK0xp0j1a7xJShD6TrMdYv
 mZ3zH5ViIhW4A3mdcMceh6fy2JLQAiEKF0uPTvcMMz7NlVul0mxyL/+10P7AE/3R
 Ehw4fzmm4NDshPDtBOkKH0LsppgXzuItFuQUTpact3JlqTg++bV6onSsrkt1hlY=
 =Z7Cm
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "There was a lot of clean ups and minor fixes.  One of those clean ups
  was to the trace_seq code.  It also removed the return values to the
  trace_seq_*() functions and use trace_seq_has_overflowed() to see if
  the buffer filled up or not.  This is similar to work being done to
  the seq_file code as well in another tree.

  Some of the other goodies include:

   - Added some "!" (NOT) logic to the tracing filter.

   - Fixed the frame pointer logic to the x86_64 mcount trampolines

   - Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
     That is, the ftrace trampoline can be dynamically allocated and be
     called directly by functions that only have a single hook to them"

* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
  tracing: Truncated output is better than nothing
  tracing: Add additional marks to signal very large time deltas
  Documentation: describe trace_buf_size parameter more accurately
  tracing: Allow NOT to filter AND and OR clauses
  tracing: Add NOT to filtering logic
  ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
  ftrace/x86: Get rid of ftrace_caller_setup
  ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
  ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
  ftrace/x86: Simplify save_mcount_regs on getting RIP
  ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
  ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
  ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
  ftrace/x86: Have static tracing also use ftrace_caller_setup
  ftrace/x86: Have static function tracing always test for function graph
  kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
  ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
  kprobes/ftrace: Recover original IP if pre_handler doesn't change it
  tracing/trivial: Fix typos and make an int into a bool
  tracing: Deletion of an unnecessary check before iput()
  ...
2014-12-10 19:58:13 -08:00
Steven Rostedt (Red Hat)
5ac4837841 tracing: Use trace_seq_used() and seq_buf_used() instead of len
As the seq_buf->len will soon be +1 size when there's an overflow, we
must use trace_seq_used() or seq_buf_used() methods to get the real
length. This will prevent buffer overflow issues if just the len
of the seq_buf descriptor is used to copy memory.

Link: http://lkml.kernel.org/r/20141114121911.09ba3d38@gandalf.local.home

Reported-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:15 -05:00
Steven Rostedt (Red Hat)
3a161d99c4 tracing: Create seq_buf layer in trace_seq
Create a seq_buf layer that trace_seq sits on. The seq_buf will not
be limited to page size. This will allow other usages of seq_buf
instead of a hard set PAGE_SIZE one that trace_seq has.

Link: http://lkml.kernel.org/r/20141104160221.864997179@goodmis.org
Link: http://lkml.kernel.org/r/20141114011412.170377300@goodmis.org

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-19 22:01:09 -05:00
Rasmus Villemoes
fa6f0cc751 tracing: Replace seq_printf by simpler equivalents
Using seq_printf to print a simple string or a single character is a
lot more expensive than it needs to be, since seq_puts and seq_putc
exist.

These patches do

  seq_printf(m, s) -> seq_puts(m, s)
  seq_printf(m, "%s", s) -> seq_puts(m, s)
  seq_printf(m, "%c", c) -> seq_putc(m, c)

Subsequent patches will simplify further.

Link: http://lkml.kernel.org/r/1415479332-25944-2-git-send-email-linux@rasmusvillemoes.dk

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-13 21:32:19 -05:00
Al Viro
946e51f2bf move d_rcu from overlapping d_child to overlapping d_alias
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-03 15:20:29 -05:00
Peter Zijlstra
fe0e01c77d tracing: Robustify wait loop
The pending nested sleep debugging triggered on the potential stale
TASK_INTERRUPTIBLE in this code.

While there, fix the loop such that we won't revert to a while(1)
yield() 'spin' loop if we ever get a spurious wakeup.

And fix the actual issue by properly terminating the 'wait' loop by
setting TASK_RUNNING.

Link: http://lkml.kernel.org/p/20141008165110.GA14547@worktop.programming.kicks-ass.net

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-10-08 19:51:01 -04:00
Linus Torvalds
c9b88e9581 Oleg Nesterov did several clean ups with the tracing filter code.
As he found some small bugs that went into 3.16, and these changes were
 based on that, I had to apply his changes to a separate branch than
 my main development branch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT3533AAoJEKQekfcNnQGuaNkIALM9uf6gvIXgvyAlhK1qfjJL
 0pxAlgppukA7hZT9gcubGuStPxKnwtVSvd/GwGI/mc7Ls57vOvdA61TEHN48l2MW
 SUev1Jeu1Nx3Wh4B2ivRq+dJadrwU9NwG507yvuXn2L8Gqzwal133dg3AHnsTYgz
 yu7oe8baxMniFFYlAmSELFp2cA3kli/WrRuQB7weudbT12jZlpOMqz7OUN9o7dfo
 E4XfwkkDUuo9EiKD1QwVMk2cIkCaHFI8lJDYJbrM3L5XJKfnKzZNRMWfwDG2L7hS
 DGjMl6T0SjU+ZIoTidHdbyF/Q+I0o9kKTnUrErGu88+2Z1gfEKN5O8zovNMbPUM=
 =dT/x
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing filter cleanups from Steven Rostedt:
 "Oleg Nesterov did several clean ups with the tracing filter code.  As
  he found some small bugs that went into 3.16, and these changes were
  based on that, I had to apply his changes to a separate branch than my
  main development branch.

  This was based on work that was already pulled into 3.16, and is a
  separate pull request to keep from having local merges in my pull
  request"

* tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Kill "filter_string" arg of replace_preds()
  tracing: Change apply_subsystem_event_filter() paths to check file->system == dir
  tracing: Kill ftrace_event_call->files
  tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic
  tracing: Kill call_filter_disable()
  tracing: Kill destroy_call_preds()
  tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04 12:02:48 -07:00
Linus Torvalds
b8c0aa46b3 This pull request has a lot of work done. The main thing is the changes
to the ftrace function callback infrastructure. It's introducing a
 way to allow different functions to call directly different trampolines
 instead of all calling the same "mcount" one.
 
 The only user of this for now is the function graph tracer, which always
 had a different trampoline, but the function tracer trampoline was called
 and did basically nothing, and then the function graph tracer trampoline
 was called. The difference now, is that the function graph tracer
 trampoline can be called directly if a function is only being traced by
 the function graph trampoline. If function tracing is also happening on
 the same function, the old way is still done.
 
 The accounting for this takes up more memory when function graph tracing
 is activated, as it needs to keep track of which functions it uses.
 I have a new way that wont take as much memory, but it's not ready yet
 for this merge window, and will have to wait for the next one.
 
 Another big change was the removal of the ftrace_start/stop() calls that
 were used by the suspend/resume code that stopped function tracing when
 entering into suspend and resume paths. The stop of ftrace was done
 because there was some function that would crash the system if one called
 smp_processor_id()! The stop/start was a big hammer to solve the issue
 at the time, which was when ftrace was first introduced into Linux.
 Now ftrace has better infrastructure to debug such issues, and I found
 the problem function and labeled it with "notrace" and function tracing
 can now safely be activated all the way down into the guts of suspend
 and resume.
 
 Other changes include clean ups of uprobe code.
 Clean up of the trace_seq() code.
 And other various small fixes and clean ups to ftrace and tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJT35zXAAoJEKQekfcNnQGuOz0H/38zqM0nLFhrgvz3EPk2UOjn
 xqpX8qyb2V7TJZL+IqeXU2a5cQZl5ba0D4WtBGpxbTae3CJYiuQ87iKUNFoH0om5
 FDpn80igb368k8V3qRdRsziKVCCf0XBd/NkHJXc0ZkfXGyzB2Ga4bBxALxp2gj9y
 bnO+vKo6+tWYKG4hyQb4P3LRXUrK8/LWEsPr39cH2QH1Rdj69Lx9CgrCdUVJmwcb
 Bj8hEiLXL/RYCFNn79A3wNTUvW0rG/AOIf4SLqXtasSRZ0ToaU0ZyDnrNv+0Ol47
 rX8tSk+LfXchL9hpIvjCf1vlAYq3pO02favteR/jip3lx/dTjEDE4RJ9qtJzZ4Q=
 =fwQY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "This pull request has a lot of work done.  The main thing is the
  changes to the ftrace function callback infrastructure.  It's
  introducing a way to allow different functions to call directly
  different trampolines instead of all calling the same "mcount" one.

  The only user of this for now is the function graph tracer, which
  always had a different trampoline, but the function tracer trampoline
  was called and did basically nothing, and then the function graph
  tracer trampoline was called.  The difference now, is that the
  function graph tracer trampoline can be called directly if a function
  is only being traced by the function graph trampoline.  If function
  tracing is also happening on the same function, the old way is still
  done.

  The accounting for this takes up more memory when function graph
  tracing is activated, as it needs to keep track of which functions it
  uses.  I have a new way that wont take as much memory, but it's not
  ready yet for this merge window, and will have to wait for the next
  one.

  Another big change was the removal of the ftrace_start/stop() calls
  that were used by the suspend/resume code that stopped function
  tracing when entering into suspend and resume paths.  The stop of
  ftrace was done because there was some function that would crash the
  system if one called smp_processor_id()! The stop/start was a big
  hammer to solve the issue at the time, which was when ftrace was first
  introduced into Linux.  Now ftrace has better infrastructure to debug
  such issues, and I found the problem function and labeled it with
  "notrace" and function tracing can now safely be activated all the way
  down into the guts of suspend and resume

  Other changes include clean ups of uprobe code, clean up of the
  trace_seq() code, and other various small fixes and clean ups to
  ftrace and tracing"

* tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  ftrace: Add warning if tramp hash does not match nr_trampolines
  ftrace: Fix trampoline hash update check on rec->flags
  ring-buffer: Use rb_page_size() instead of open coded head_page size
  ftrace: Rename ftrace_ops field from trampolines to nr_trampolines
  tracing: Convert local function_graph functions to static
  ftrace: Do not copy old hash when resetting
  tracing: let user specify tracing_thresh after selecting function_graph
  ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()
  tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST
  s390/ftrace: remove check of obsolete variable function_trace_stop
  arm64, ftrace: Remove check of obsolete variable function_trace_stop
  Blackfin: ftrace: Remove check of obsolete variable function_trace_stop
  metag: ftrace: Remove check of obsolete variable function_trace_stop
  microblaze: ftrace: Remove check of obsolete variable function_trace_stop
  MIPS: ftrace: Remove check of obsolete variable function_trace_stop
  parisc: ftrace: Remove check of obsolete variable function_trace_stop
  sh: ftrace: Remove check of obsolete variable function_trace_stop
  sparc64,ftrace: Remove check of obsolete variable function_trace_stop
  tile: ftrace: Remove check of obsolete variable function_trace_stop
  ftrace: x86: Remove check of obsolete variable function_trace_stop
  ...
2014-08-04 11:50:00 -07:00
Oleg Nesterov
57375747b6 tracing: Kill destroy_call_preds()
Remove destroy_call_preds(). Its only caller, __trace_remove_event_call(),
can use free_event_filter() and nullify ->filter by hand.

Perhaps we could keep this trivial helper although imo it is pointless, but
then it should be static in trace_events.c.

Link: http://lkml.kernel.org/p/20140715184816.GA20495@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:22:32 -04:00
Oleg Nesterov
3e5454d656 tracing: Kill destroy_preds() and destroy_file_preds()
destroy_preds() makes no sense.

The only caller, event_remove(), actually wants destroy_file_preds().
__trace_remove_event_call() does destroy_call_preds() which takes care
of call->filter.

And after the previous change we can simply remove destroy_preds() from
event_remove(), we are going to call remove_event_from_tracers() which
in turn calls remove_event_file_dir()->free_event_filter().

Link: http://lkml.kernel.org/p/20140715184813.GA20488@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16 14:19:55 -04:00
Oleg Nesterov
2448e3493c tracing: instance_rmdir() leaks ftrace_event_file->filter
instance_rmdir() path destroys the event files but forgets to free
file->filter. Change remove_event_file_dir() to free_event_filter().

Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Cc: stable@vger.kernel.org # 3.11+
Fixes: f6a84bdc75 "tracing: Introduce remove_event_file_dir()"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-14 14:41:13 -04:00
Fabian Frederick
3448bac329 tracing: Convert pr_warning() to pr_warn() in trace_events.c
Convert pr_warning to standard pr_warn
Define pr_fmt(fmt) fmt to avoid any future default fmt definition

Link: http://lkml.kernel.org/p/1402141388-21144-1-git-send-email-fabf@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01 07:13:42 -04:00
Yoshihiro YUNOMAE
dc81e5e3ab tracing: Return error if ftrace_trace_arrays list is empty
ftrace_trace_arrays links global_trace.list. However, global_trace
is not added to ftrace_trace_arrays if trace_alloc_buffers() failed.
As the result, ftrace_trace_arrays becomes an empty list. If
ftrace_trace_arrays is an empty list, current top_trace_array() returns
an invalid pointer. As the result, the kernel can induce memory corruption
or panic.

Current implementation does not check whether ftrace_trace_arrays is empty
list or not. So, in this patch, if ftrace_trace_arrays is empty list,
top_trace_array() returns NULL. Moreover, this patch makes all functions
calling top_trace_array() handle it appropriately.

Link: http://lkml.kernel.org/p/20140605223517.32311.99233.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-06 04:47:46 -04:00
Mathieu Desnoyers
de7b297390 tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.

This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.

From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.

   text    data     bss     dec     hex filename
10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
10434930        4282848 10391552        25109330        17f2352 vmlinux

Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

CC: Ingo Molnar <mingo@kernel.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Frank Ch. Eigler <fche@redhat.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-04-08 20:43:28 -04:00
Linus Torvalds
68114e5eb8 Most of the changes were largely clean ups, and some documentation.
But there were a few features that were added.
 
 Uprobes now work with event triggers and multi buffers.
 Uprobes have support under ftrace and perf.
 
 The big feature is that the function tracer can now be used within the
 multi buffer instances. That is, you can now trace some functions
 in one buffer, others in another buffer, all functions in a third buffer
 and so on. They are basically agnostic from each other. This only
 works for the function tracer and not for the function graph trace,
 although you can have the function graph tracer running in the top level
 buffer (or any tracer for that matter) and have different function tracing
 going on in the sub buffers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTOthtAAoJEKQekfcNnQGu5c8H/Ana/U+0tmksp1dbHkRHsKSH
 +Fsv4Jeu8gf1NaFKHEhkUTcFtnzE6qAPV2VCrcJwXbhAhhwZm+LjrnWdoy3215S3
 cQW4LftLEonh2cM36Cos74TulMEYN6XmL6dQZV+CILKQkDrWU4qJjQ64okXEkqrd
 9iG3p/mSXyvJcmnyg61ALnMOhZDLsXY3djBhWBPhiTPGS6BRb9zh4Pmw6Zv0n2rJ
 U93Gt/3AQrv1ybu73dUxqP0abp60oXOiWoF/R2jcbKqIM+K9RPJX79unCV3jq3u9
 f+6jMlB9PgAMqQj6ihJdwxKDDuzwyrVdEPnsgvl4jarCBCtVVwhKedBaKN/KS8k=
 =HdXY
 -----END PGP SIGNATURE-----

Merge tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "Most of the changes were largely clean ups, and some documentation.
  But there were a few features that were added:

  Uprobes now work with event triggers and multi buffers and have
  support under ftrace and perf.

  The big feature is that the function tracer can now be used within the
  multi buffer instances.  That is, you can now trace some functions in
  one buffer, others in another buffer, all functions in a third buffer
  and so on.  They are basically agnostic from each other.  This only
  works for the function tracer and not for the function graph trace,
  although you can have the function graph tracer running in the top
  level buffer (or any tracer for that matter) and have different
  function tracing going on in the sub buffers"

* tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
  tracing: Add BUG_ON when stack end location is over written
  tracepoint: Remove unused API functions
  Revert "tracing: Move event storage for array from macro to standalone function"
  ftrace: Constify ftrace_text_reserved
  tracepoints: API doc update to tracepoint_probe_register() return value
  tracepoints: API doc update to data argument
  ftrace: Fix compilation warning about control_ops_free
  ftrace/x86: BUG when ftrace recovery fails
  ftrace: Warn on error when modifying ftrace function
  ftrace: Remove freelist from struct dyn_ftrace
  ftrace: Do not pass data to ftrace_dyn_arch_init
  ftrace: Pass retval through return in ftrace_dyn_arch_init()
  ftrace: Inline the code from ftrace_dyn_table_alloc()
  ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
  tracing: Evaluate len expression only once in __dynamic_array macro
  tracing: Correctly expand len expressions from __dynamic_array macro
  tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
  tracing: Fix event header migrate.h to include tracepoint.h
  tracing: Fix event header writeback.h to include tracepoint.h
  tracing: Warn if a tracepoint is not set via debugfs
  ...
2014-04-03 10:26:31 -07:00
Steven Rostedt (Red Hat)
bc4c426ee2 Revert "tracing: Move event storage for array from macro to standalone function"
I originally wrote commit 35bb4399bd to shrink the size of the overhead of
tracepoints by several kilobytes. Later, I received a patch from Vaibhav
Nagarnaik that fixed a bug in the same code that this commit touches. Not
only did it fix a bug, it also removed code and shrunk the size of the
overhead of trace events even more than this commit did.

Since this commit is scheduled for 3.15 and Vaibhav's patch is already in
mainline, I need to revert this patch in order to keep it from conflicting
with Vaibhav's patch. Not to mention, Vaibhav's patch makes this patch
obsolete.

Link: http://lkml.kernel.org/r/20140320225637.0226041b@gandalf.local.home

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-21 13:11:41 -04:00
Vaibhav Nagarnaik
87291347c4 tracing: Fix array size mismatch in format string
In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20 13:21:05 -04:00
Steven Rostedt
3fd40d1ee6 tracing: Use helper functions in event assignment to shrink macro size
The functions that assign the contents for the ftrace events are
defined by the TRACE_EVENT() macros. Each event has its own unique
way to assign data to its buffer. When you have over 500 events,
that means there's 500 functions assigning data uniquely for each
event (not really that many, as DECLARE_EVENT_CLASS() and multiple
DEFINE_EVENT()s will only need a single function).

By making helper functions in the core kernel to do some of the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux
12959102        1913504 9785344 24657950        178401e /tmp/vmlinux.patched

That's a total of 28288 bytes, which comes down to 56 bytes per event.

Link: http://lkml.kernel.org/r/20120810034708.370808175@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:07 -05:00
Steven Rostedt
35bb4399bd tracing: Move event storage for array from macro to standalone function
The code that shows array fields for events is defined for all events.
This can add up quite a bit when you have over 500 events.

By making helper functions in the core kernel to do the work
instead, we can shrink the size of the kernel down a bit.

With a kernel configured with 502 events, the change in size was:

   text    data     bss     dec     hex filename
12990946        1913568 9785344 24689858        178bcc2 /tmp/vmlinux
12987390        1913504 9785344 24686238        178ae9e /tmp/vmlinux.patched

That's a total of 3556 bytes, which comes down to 7 bytes per event.
Although it's not much, this code is just called at initialization of
the events.

Link: http://lkml.kernel.org/r/20120810034708.084036335@goodmis.org

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:06 -05:00
Steven Rostedt (Red Hat)
45ab2813d4 tracing: Do not add event files for modules that fail tracepoints
If a module fails to add its tracepoints due to module tainting, do not
create the module event infrastructure in the debugfs directory. As the events
will not work and worse yet, they will silently fail, making the user wonder
why the events they enable do not display anything.

Having a warning on module load and the events not visible to the users
will make the cause of the problem much clearer.

Link: http://lkml.kernel.org/r/20140227154923.265882695@goodmis.org

Fixes: 6d723736e4 "tracing/events: add support for modules to TRACE_EVENT"
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org # 2.6.31+
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-03 21:11:05 -05:00
Steven Rostedt (Red Hat)
2875a08b2d tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdef
Now that event triggers use ftrace_event_file(), it needs to be outside
the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is
not defined.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:17 -05:00
Tom Zanussi
7862ad1846 tracing: Add 'enable_event' and 'disable_event' event trigger commands
Add 'enable_event' and 'disable_event' event_command commands.

enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:

    echo 'enable_event:system:event' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event' > .../othersys/otherevent/trigger

The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

    echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
    echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.

This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.

Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-21 22:02:16 -05:00
Tom Zanussi
85f2b08268 tracing: Add basic event trigger framework
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires.  Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that.  Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function.  Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file.  It creates and attaches the trigger iterator and sets
up the command parser.  If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations.  They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event.  It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

   - choose ops - use the value of param to choose either a number or
     count version of event_trigger_ops specific to the command
   - do the register or unregister of those ops
   - associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized.  When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite.  The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions.  This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked.  The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.

This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.

Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-20 18:40:22 -05:00
Steven Rostedt
3ccb012392 tracing: Only run synchronize_sched() at instance deletion time
It has been reported that boot up with FTRACE_SELFTEST enabled can take a
very long time. There can be stalls of over a minute.

This was tracked down to the synchronize_sched() called when a system call
event is disabled. As the self tests enable and disable thousands of events,
this makes the synchronize_sched() get called thousands of times.

The synchornize_sched() was added with d562aff93b "tracing: Add support
for SOFT_DISABLE to syscall events" which caused this regression (added
in 3.13-rc1).

The synchronize_sched() is to protect against the events being accessed
when a tracer instance is being deleted. When an instance is being deleted
all the events associated to it are unregistered. The synchronize_sched()
makes sure that no more users are running when it finishes.

Instead of calling synchronize_sched() for all syscall events, we only
need to call it once, after the events are unregistered and before the
instance is deleted. The event_mutex is held during this action to
prevent new users from enabling events.

Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home

Reported-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-12-05 14:22:30 -05:00
Geyslan G. Bem
d6d3523caa tracing: Do not assign filp->private_data to freed memory
In system_tr_open(), the filp->private_data can be assigned the 'dir'
variable even if it was freed. This is on the error path, and is
harmless because the error return code will prevent filp->private_data
from being used. But for correctness, we should not assign it to
a recently freed variable, as that can cause static tools to give
false warnings.

Also have both subsystem_open() and system_tr_open() return -ENODEV
if tracing has been disabled.

Link: http://lkml.kernel.org/r/1383764571-7318-1-git-send-email-geyslan@gmail.com

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-06 15:26:54 -05:00
Tom Zanussi
f306cc82a9 tracing: Update event filters for multibuffer
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 2048
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

  # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # mkdir /sys/kernel/debug/tracing/instances/test1
  # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
  bytes_alloc > 8192
  # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
  bytes_alloc > 2048

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-11-05 16:50:20 -05:00
Oleg Nesterov
836d481ed7 tracing: Kill the !CONFIG_MODULES code in trace_events.c
Move trace_module_nb under CONFIG_MODULES and kill the dummy
trace_module_notify(). Imho it doesn't make sense to define
"struct notifier_block" and its .notifier_call just to avoid
"ifdef" in event_trace_init(), and all other !CONFIG_MODULES
code has already gone away.

Link: http://lkml.kernel.org/r/20130731173137.GA31043@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-21 23:28:03 -04:00
Oleg Nesterov
620a30e97f tracing: Don't pass file_operations array to event_create_dir()
Now that event_create_dir() and __trace_add_new_event() always
use the same file_operations we can kill these arguments and
simplify the code.

Link: http://lkml.kernel.org/r/20130731173135.GA31040@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-21 23:25:06 -04:00
Oleg Nesterov
779c5e3791 tracing: Kill trace_create_file_ops() and friends
trace_create_file_ops() allocates the copy of id/filter/format/enable
file_operations to set "f_op->owner = mod" for fops_get().

However after the recent changes there is no reason to prevent rmmod
even if one of these files is opened. A file operation can do nothing
but fail after remove_event_file_dir() clears ->i_private for every
file removed by trace_module_remove_events().

Kill "struct ftrace_module_file_ops" and fix the compilation errors.

Link: http://lkml.kernel.org/r/20130731173132.GA31033@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-21 22:31:23 -04:00
Steven Rostedt (Red Hat)
2ba64035d0 tracing: Add comment to describe special break case in probe_remove_event_call()
The "break" used in the do_for_each_event_file() is used as an optimization
as the loop is really a double loop. The loop searches all event files
for each trace_array. There's only one matching event file per trace_array
and after we find the event file for the trace_array, the break is used
to jump to the next trace_array and start the search there.

As this is not a standard way of using "break" in C code, it requires
a comment right before the break to let people know what is going on.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31 13:16:22 -04:00
Oleg Nesterov
2816c551c7 tracing: trace_remove_event_call() should fail if call/file is in use
Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.

The caller is going to free this call/file so we must ensure that
nobody can use them after trace_remove_event_call() succeeds.
debugfs should be fine after the previous changes and event_remove()
does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need
the additional checks:

- There could be a perf_event(s) attached to this tp_event, so the
  patch checks ->perf_refcount.

- TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE,
  so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex.

Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31 13:12:48 -04:00
Oleg Nesterov
bf682c3159 tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private
Change remove_event_file_dir() to clear ->i_private for every
file we are going to remove.

We need to check file->dir != NULL because event_create_dir()
can fail. debugfs_remove_recursive(NULL) is fine but the patch
moves it under the same check anyway for readability.

spin_lock(d_lock) and "d_inode != NULL" check are not needed
afaics, but I do not understand this code enough.

tracing_open_generic_file() and tracing_release_generic_file()
can go away, ftrace_enable_fops and ftrace_event_filter_fops()
use tracing_open_generic() but only to check tracing_disabled.

This fixes all races with event_remove() or instance_delete().
f_op->read/write/whatever can never use the freed file/call,
all event/* files were changed to check and use ->i_private
under event_mutex.

Note: this doesn't not fix other problems, event_remove() can
destroy the active ftrace_event_call, we need more changes but
those changes are completely orthogonal.

Link: http://lkml.kernel.org/r/20130728183527.GB16723@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:57:11 -04:00
Oleg Nesterov
f6a84bdc75 tracing: Introduce remove_event_file_dir()
Preparation for the next patch. Extract the common code from
remove_event_from_tracers() and __trace_remove_event_dirs()
into the new helper, remove_event_file_dir().

The patch looks more complicated than it actually is, it also
moves remove_subsystem() up to avoid the forward declaration.

Link: http://lkml.kernel.org/r/20130726172547.GA3629@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:57:10 -04:00
Oleg Nesterov
c5a44a1200 tracing: Change f_start() to take event_mutex and verify i_private != NULL
trace_format_open() and trace_format_seq_ops are racy, nothing
protects ftrace_event_call from trace_remove_event_call().

Change f_start() to take event_mutex and verify i_private != NULL,
change f_stop() to drop this lock.

This fixes nothing, but now we can change debugfs_remove("format")
callers to nullify ->i_private and fix the the problem.

Note: the usage of event_mutex is sub-optimal but simple, we can
change this later.

Link: http://lkml.kernel.org/r/20130726172543.GA3622@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:57:10 -04:00
Oleg Nesterov
e2912b091c tracing: Change event_filter_read/write to verify i_private != NULL
event_filter_read/write() are racy, ftrace_event_call can be already
freed by trace_remove_event_call() callers.

1. Shift mutex_lock(event_mutex) from print/apply_event_filter to
   the callers.

2. Change the callers, event_filter_read() and event_filter_write()
   to read i_private under this mutex and abort if it is NULL.

This fixes nothing, but now we can change debugfs_remove("filter")
callers to nullify ->i_private and fix the the problem.

Link: http://lkml.kernel.org/r/20130726172540.GA3619@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:56:59 -04:00
Oleg Nesterov
bc6f6b08de tracing: Change event_enable/disable_read() to verify i_private != NULL
tracing_open_generic_file() is racy, ftrace_event_file can be
already freed by rmdir or trace_remove_event_call().

Change event_enable_read() and event_disable_read() to read and
verify "file = i_private" under event_mutex.

This fixes nothing, but now we can change debugfs_remove("enable")
callers to nullify ->i_private and fix the the problem.

Link: http://lkml.kernel.org/r/20130726172536.GA3612@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:05:40 -04:00
Oleg Nesterov
1a11126bcb tracing: Turn event/id->i_private into call->event.type
event_id_read() is racy, ftrace_event_call can be already freed
by trace_remove_event_call() callers.

Change event_create_dir() to pass "data = call->event.type", this
is all event_id_read() needs. ftrace_event_id_fops no longer needs
tracing_open_generic().

We add the new helper, event_file_data(), to read ->i_private, it
will have more users.

Note: currently ACCESS_ONCE() and "id != 0" check are not needed,
but we are going to change event_remove/rmdir to clear ->i_private.

Link: http://lkml.kernel.org/r/20130726172532.GA3605@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-29 22:04:30 -04:00
Oleg Nesterov
cd458ba9d5 tracing: Do not (ab)use trace_seq in event_id_read()
event_id_read() has no reason to kmalloc "struct trace_seq"
(more than PAGE_SIZE!), it can use a small buffer instead.

Note: "if (*ppos) return 0" looks strange and even wrong,
simple_read_from_buffer() handles ppos != 0 case corrrectly.

And it seems that almost every user of trace_seq in this file
should be converted too. Unless you use seq_open(), trace_seq
buys nothing compared to the raw buffer, but it needs a bit
more memory and code.

Link: http://lkml.kernel.org/r/20130718184712.GA4786@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18 21:31:33 -04:00
Oleg Nesterov
7710b63995 tracing: Simplify the iteration logic in f_start/f_next
f_next() looks overcomplicated, and it is not strictly correct
even if this doesn't matter.

Say, FORMAT_FIELD_SEPERATOR should not return NULL (means EOF)
if trace_get_fields() returns an empty list, we should simply
advance to FORMAT_PRINTFMT as we do when we find the end of list.

1. Change f_next() to return "struct list_head *" rather than
   "ftrace_event_field *", and change f_show() to do list_entry().

   This simplifies the code a bit, only f_show() needs to know
   about ftrace_event_field, and f_next() can play with ->prev
   directly

2. Change f_next() to not play with ->prev / return inside the
   switch() statement. It can simply set node = head/common_head,
   the prev-or-advance-to-the-next-magic below does all work.

While at it. f_start() looks overcomplicated too. I don't think
*pos == 0 makes sense as a separate case, just change this code
to do "while" instead of "do/while".

The patch also moves f_start() down, close to f_stop(). This is
purely cosmetic, just to make the locking added by the next patch
more clear/visible.

Link: http://lkml.kernel.org/r/20130718184710.GA4783@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18 21:31:32 -04:00
Alexander Z Lam
f77d09a384 tracing: Miscellaneous fixes for trace_array ref counting
Some error paths did not handle ref counting properly, and some trace files need
ref counting.

Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com

Cc: stable@vger.kernel.org # 3.10
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-18 21:31:30 -04:00
Steven Rostedt (Red Hat)
2a6c24afab tracing: Fix race between deleting buffer and setting events
While analyzing the code, I discovered that there's a potential race between
deleting a trace instance and setting events. There are a few races that can
occur if events are being traced as the buffer is being deleted. Mostly the
problem comes with freeing the descriptor used by the trace event callback.
To prevent problems like this, the events are disabled before the buffer is
deleted. The problem with the current solution is that the event_mutex is let
go between disabling the events and freeing the files, which means that the events
could be enabled again while the freeing takes place.

Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-02 20:42:25 -04:00
Steven Rostedt (Red Hat)
8e2e2fa471 tracing: Add trace_array_get/put() to event handling
Commit a695cb5816 "tracing: Prevent deleting instances when they are being read"
tried to fix a race between deleting a trace instance and reading contents
of a trace file. But it wasn't good enough. The following could crash the kernel:

 # cd /sys/kernel/debug/tracing/instances
 # ( while :; do mkdir foo; rmdir foo; done ) &
 # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) &

Luckily this can only be done by root user, but it should be fixed regardless.

The problem is that a delete of the file can happen after the write to the event
is opened, but before the enabling happens.

The solution is to make sure the trace_array is available before succeeding in
opening for write, and incerment the ref counter while opened.

Now the instance can be deleted when the events are writing to the buffer,
but the deletion of the instance will disable all events before the instance
is actually deleted.

Cc: stable@vger.kernel.org # 3.10
Reported-by: Alexander Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-02 17:13:34 -04:00
Alexander Z Lam
a82274151a tracing: Protect ftrace_trace_arrays list in trace_events.c
There are multiple places where the ftrace_trace_arrays list is accessed in
trace_events.c without the trace_types_lock held.

Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01 23:30:08 -04:00
Tom Zanussi
3baa5e4cf2 tracing: Fix disabling of soft disable
The comment on the soft disable 'disable' case of
__ftrace_event_enable_disable() states that the soft disable bit
should be cleared in that case, but currently only the soft mode bit
is actually cleared.

This essentially leaves the standard non-soft-enable enable/disable
paths as the only way to clear the soft disable flag, but the soft
disable bit should also be cleared when removing a trigger with '!'.

Also, the SOFT_DISABLED bit should never be set if SOFT_MODE is
cleared.

This fixes the above discrepancies.

Link: http://lkml.kernel.org/r/b9c68dd50bc07019e6c67d3f9b29be4ef1b2badb.1372479499.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01 20:34:26 -04:00
Tom Zanussi
a439059610 tracing: Simplify code for showing of soft disabled flag
Rather than enumerating each permutation, build the enable state
string up from the combination of states.  This also allows for the
simpler addition of more states.

Link: http://lkml.kernel.org/r/9aff5af6dee2f5a40ca30df41c39d5f33e998d7a.1372479499.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01 20:34:25 -04:00
Steven Rostedt
6e94a78037 tracing: Failed to create system directory
Running the following:

 # cd /sys/kernel/debug/tracing
 # echo p:i do_sys_open > kprobe_events
 # echo p:j schedule >> kprobe_events
 # cat kprobe_events
p:kprobes/i do_sys_open
p:kprobes/j schedule
 # echo p:i do_sys_open >> kprobe_events
 # cat kprobe_events
p:kprobes/j schedule
p:kprobes/i do_sys_open
 # ls /sys/kernel/debug/tracing/events/kprobes/
enable  filter  j

Notice that the 'i' is missing from the kprobes directory.

The console produces:

"Failed to create system directory kprobes"

This is because kprobes passes in a allocated name for the system
and the ftrace event subsystem saves off that name instead of creating
a duplicate for it. But the kprobes may free the system name making
the pointer to it invalid.

This bug was introduced by 92edca073c "tracing: Use direct field, type
and system names" which switched from using kstrdup() on the system name
in favor of just keeping apointer to it, as the internal ftrace event
system names are static and exist for the life of the computer being booted.

Instead of reverting back to duplicating system names again, we can use
core_kernel_data() to determine if the passed in name was allocated or
static. Then use the MSB of the ref_count to be a flag to keep track if
the name was allocated or not. Then we can still save from having to duplicate
strings that will always exist, but still copy the ones that may be freed.

Cc: stable@vger.kernel.org # 3.10
Reported-by: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Reported-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-01 20:34:22 -04:00
Namhyung Kim
aaf6ac0f08 tracing: Do not call kmem_cache_free() on allocation failure
There's no point calling it when _alloc() failed.

Link: http://lkml.kernel.org/r/1370585268-29169-1-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11 18:38:52 -04:00
Harsh Prateek Bora
8092e808a3 tracing/trivial: Consolidate error return condition
Consolidate the checks for !enabled and !param to return -EINVAL
in event_enable_func().

Link: http://lkml.kernel.org/r/1369380137-12452-1-git-send-email-harsh@linux.vnet.ibm.com

Signed-off-by: Harsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-11 18:38:49 -04:00
Masami Hiramatsu
6ed0106667 tracing: Return -EBUSY when event_enable_func() fails to get module
Since try_module_get() returns false( = 0) when it fails to
pindown a module, event_enable_func() returns 0 which means
"succeed". This can cause a kernel panic when the entry
is removed, because the event is already released.

This fixes the bug by returning -EBUSY, because the reason
why it fails is that the module is being removed at that time.

Link: http://lkml.kernel.org/r/20130516114848.13508.97899.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-16 11:01:16 -04:00
Steven Rostedt (Red Hat)
da511bf33e tracing: Add helper function trace_create_new_event() to remove duplicate code
Both __trace_add_new_event() and __trace_early_add_new_event() do
basically the same thing, except that __trace_add_new_event() does
a little more.

Instead of having duplicate code between the two functions, add
a helper function trace_create_new_event() that both can use.
This will help against having bugs fixed in one function but not
the other.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:49 -04:00
Masami Hiramatsu
1cf4c0732d tracing: Modify soft-mode only if there's no other referrer
Modify soft-mode flag only if no other soft-mode referrer
(currently only the ftrace triggers) by using a reference
counter in each ftrace_event_file.

Without this fix, adding and removing several different
enable/disable_event triggers on the same event clear
soft-mode bit from the ftrace_event_file. This also
happens with a typo of glob on setting triggers.

e.g.

 # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
 # cat events/net/netif_rx/enable
 0*
 # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
 # cat events/net/netif_rx/enable
 0
 # cat set_ftrace_filter
 #### all functions enabled ####
 vfs_symlink:enable_event:net:netif_rx:unlimited

As above, we still have a trigger, but soft-mode is gone.

Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:25 -04:00
Masami Hiramatsu
30052170dc tracing: Indicate enabled soft-mode in enable file
Indicate enabled soft-mode event as "1*" in "enable" file
for each event, because it can be soft-disabled when disable_event
trigger is hit.

Link: http://lkml.kernel.org/r/20130509054426.30398.28202.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:07 -04:00
Steven Rostedt (Red Hat)
ff305ded9f tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
register_ftrace_function_probe() returns the number of functions
it registered, which can be zero, it can also return a negative number
if something went wrong. But event_enable_func() only checks for
the case that it didn't register anything, it needs to also check
for the case that something went wrong and return that error code
as well.

Added some comments about the code as well, to make it more
understandable.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 11:30:26 -04:00
Masami Hiramatsu
a5b85bd155 tracing: Don't succeed if event_enable_func did not register anything
Return 0 instead of the number of activated ftrace function probes if
event_enable_func succeeded and return an error code if it failed or
did not register any functions. But it currently returns the number
of registered functions and if it didn't register anything, it returns 0,
but that is considered success.

This also fixes the return value. As if it succeeds, it returns the
number of functions that were enabled, which is returned back to
the user in ftrace_regex_write (the write() return code). If only
one function is enabled, then the return code of the write is one,
and this can confuse the user program in thinking it only wrote 1
byte.

Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Rewrote change log to reflect that this fixes two bugs - SR ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 11:26:01 -04:00
zhangwei(Jovi)
52f6ad6dc3 tracing: Rename trace_event_mutex to trace_event_sem
trace_event_mutex is an rw semaphore now, not a mutex, change the name.

Link: http://lkml.kernel.org/r/513D843B.40109@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ Forward ported to my new code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:10 -04:00
zhangwei(Jovi)
ad7067cebf tracing: Convert trace_destroy_fields() to static
trace_destroy_fields() is not used outside of the file. It can be
a static function.

Link: http://lkml.kernel.org/r/513D842A.2000907@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:08 -04:00
zhangwei(Jovi)
b3a8c6fd7b tracing: Move find_event_field() into trace_events.c
By moving find_event_field() and trace_find_field() into trace_events.c,
the ftrace_common_fields list and trace_get_fields() can become local to
the trace_events.c file.

find_event_field() is renamed to trace_find_event_field() to conform to
the tracing global function names.

Link: http://lkml.kernel.org/r/513D8426.9070109@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ rostedt: Modified trace_find_field() to trace_find_event_field() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:07 -04:00
Steven Rostedt (Red Hat)
3cd715de26 tracing: Add function probe triggers to enable/disable events
Add triggers to function tracer that lets an event get enabled or
disabled when a function is called:

format is:

 <function>:enable_event:<system>:<event>[:<count>]
 <function>:disable_event:<system>:<event>[:<count>]

 echo 'schedule:enable_event:sched:sched_switch' > /debug/tracing/set_ftrace_filter

Every time schedule is called, it will enable the sched_switch event.

 echo 'schedule:disable_event:sched:sched_switch:2' > /debug/tracing/set_ftrace_filter

The first two times schedule is called while the sched_switch
event is enabled, it will disable it. It will not count for a time
that the event is already disabled (or enabled for enable_event).

[ fixed return without mutex_unlock() - thanks to Dan Carpenter and smatch ]

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:36:04 -04:00
Steven Rostedt (Red Hat)
417944c4c7 tracing: Add a way to soft disable trace events
In order to let triggers enable or disable events, we need a 'soft'
method for doing so. For example, if a function probe is added that
lets a user enable or disable events when a function is called, that
change must be done without taking locks or a mutex, and definitely
it can't sleep. But the full enabling of a tracepoint is expensive.

By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
without the protection of a mutex (using set/clear_bit()), this soft
disable flag can be used to allow critical sections to enable or disable
events from being traced (after the event has been placed into "SOFT_MODE").

Some caveats though: The comm recorder (to map pids with a comm) can not
be soft disabled (yet). If you disable an event with with a "soft"
disable and wait a while before reading the trace, the comm cache may be
replaced and you'll get a bunch of <...> for comms in the trace.

Reading the "enable" file for an event that is disabled will now give
you "0*" where the '*' denotes that the tracepoint is still active but
the event itself is "disabled".

[ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:36:03 -04:00
Steven Rostedt (Red Hat)
55034cd6e6 tracing: Add alloc_snapshot kernel command line parameter
If debugging the kernel, and the developer wants to use
tracing_snapshot() in places where tracing_snapshot_alloc() may
be difficult (or more likely, the developer is lazy and doesn't
want to bother with tracing_snapshot_alloc() at all), then adding

  alloc_snapshot

to the kernel command line parameter will tell ftrace to allocate
the snapshot buffer (if configured) when it allocates the main
tracing buffer.

I also noticed that ring_buffer_expanded and tracing_selftest_disabled
had inconsistent use of boolean "true" and "false" with "0" and "1".
I cleaned that up too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:35:53 -04:00
Steven Rostedt (Red Hat)
873c642f59 tracing: Clear all trace buffers when unloaded module event was used
Currently we do not know what buffer a module event was enabled in.
On unload, it is safest to clear all buffer instances, not just the
top level buffer.

Todo: Clear only the buffer that the event was used in. The
infrastructure is there to do this, but it makes the code a bit
more complex. Lets get the current code vetted before we add that.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:57 -04:00
Steven Rostedt (Red Hat)
575380da8b tracing: Only clear trace buffer on module unload if event was traced
Currently, when a module with events is unloaded, the trace buffer is
cleared. This is just a safety net in case the module might have some
strange callback when its event is outputted. But there's no reason
to reset the buffer if the module didn't have any of its events traced.

Add a flag to the event "call" structure called WAS_ENABLED and gets set
when the event is ever enabled, and this flag never gets cleared. When a
module gets unloaded, if any of its events have this flag set, then the
trace buffer will get cleared.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:56 -04:00
Steven Rostedt (Red Hat)
315326c16a tracing: Fix trace events build without modules
The new multi-buffers added a descriptor that kept track of module
events, and the directories they use, with struct ftace_module_file_ops.
This is used to add a ref count to keep modules from unloading while
their files are being accessed.

As the descriptor is only needed when CONFIG_MODULES is enabled, it
is only declared when the config is enabled. But that struct is
dereferenced in a few areas outside the #ifdef CONFIG_MODULES.

By adding some helper routines and moving code around a little,
events can be compiled again without modules.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:53 -04:00
Steven Rostedt
92edca073c tracing: Use direct field, type and system names
The names used to display the field and type in the event format
files are copied, as well as the system name that is displayed.

All these names are created by constant values passed in.
If one of theses values were to be removed by a module, the module
would also be required to remove any event it created.

By using the strings directly, we can save over 100K of memory.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:47 -04:00
Steven Rostedt
d1a291437f tracing: Use kmem_cache_alloc instead of kmalloc in trace_events.c
The event structures used by the trace events are mostly persistent,
but they are also allocated by kmalloc, which is not the best at
allocating space for what is used. By converting these kmallocs
into kmem_cache_allocs, we can save over 50K of space that is
permanently allocated.

After boot we have:

 slab name          active allocated size
 ---------          ------ --------- ----
ftrace_event_file    979   1005     56   67    1
ftrace_event_field   2301   2310     48   77    1

The ftrace_event_file has at boot up 979 active objects out of
1005 allocated in the slabs. Each object is 56 bytes. In a normal
kmalloc, that would allocate 64 bytes for each object.

 1005 - 979  = 26 objects not used
 26 * 56 = 1456 bytes wasted

But if we used kmalloc:

 64 - 56 = 8 bytes unused per allocation
 8 * 979 = 7832 bytes wasted

 7832 - 1456 = 6376 bytes in savings

Doing the same for ftrace_event_field where there's 2301 objects
allocated in a slab that can hold 2310 with 48 bytes each we have:

 2310 - 2301 = 9 objects not used
 9 * 48 = 432 bytes wasted

A kmalloc would also use 64 bytes per object:

 64 - 48 = 16 bytes unused per allocation
 16 * 2301 = 36816 bytes wasted!

 36816 - 432 = 36384 bytes in savings

This change gives us a total of 42760 bytes in savings. At least
on my machine, but as there's a lot of these persistent objects
for all configurations that use trace points, this is a net win.

Thanks to Ezequiel Garcia for his trace_analyze presentation which
pointed out the wasted space in my code.

Cc: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:46 -04:00
Steven Rostedt
772482216f tracing: Get trace_events kernel command line working again
With the new descriptors used to allow multiple buffers in the
tracing directory added, the kernel command line parameter
trace_events=... no longer works. This is because the top level
(global) trace array now has a list of descriptors associated
with the events and the files in the debugfs directory. But in
early bootup, when the command line is processed and the events
enabled, the trace array list of events has not been set up yet.

Without the list of events in the trace array, the setting of
events to record will fail because it would not match any events.

The solution is to set up the top level array in two stages.
The first is to just add the ftrace file descriptors that just point
to the events. This will allow events to be enabled and start tracing.
The second stage is called after the filesystem is set up, and this
stage will create the debugfs event files and directories associated
with the trace array events.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:46 -04:00
Steven Rostedt
0c8916c342 tracing: Add rmdir to remove multibuffer instances
Add a method to the hijacked dentry descriptor of the
"instances" directory to allow for rmdir to remove an
instance of a multibuffer.

Example:

  cd /debug/tracing/instances
  mkdir hello
  ls
hello/
  rmdir hello
  ls

Like the mkdir method, the i_mutex is dropped for the instances
directory. The instances directory is created at boot up and can
not be renamed or removed. The trace_types_lock mutex is used to
synchronize adding and removing of instances.

I've run several stress tests with different threads trying to
create and delete directories of the same name, and it has stood
up fine.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:45 -04:00
Steven Rostedt
277ba04461 tracing: Add interface to allow multiple trace buffers
Add the interface ("instances" directory) to add multiple buffers
to ftrace. To create a new instance, simply do a mkdir in the
instances directory:

This will create a directory with the following:

 # cd instances
 # mkdir foo
 # ls foo
buffer_size_kb        free_buffer  trace_clock    trace_pipe
buffer_total_size_kb  set_event    trace_marker   tracing_enabled
events/               trace        trace_options  tracing_on

Currently only events are able to be set, and there isn't a way
to delete a buffer when one is created (yet).

Note, the i_mutex lock is dropped from the parent "instances"
directory during the mkdir operation. As the "instances" directory
can not be renamed or deleted (created on boot), I do not see
any harm in dropping the lock. The creation of the sub directories
is protected by trace_types_lock mutex, which only lets one
instance get into the code path at a time. If two tasks try to
create or delete directories of the same name, only one will occur
and the other will fail with -EEXIST.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:44 -04:00
Steven Rostedt
ae63b31e4d tracing: Separate out trace events from global variables
The trace events for ftrace are all defined via global variables.
The arrays of events and event systems are linked to a global list.
This prevents multiple users of the event system (what to enable and
what not to).

By adding descriptors to represent the event/file relation, as well
as to which trace_array descriptor they are associated with, allows
for more than one set of events to be defined. Once the trace events
files have a link between the trace event and the trace_array they
are associated with, we can create multiple trace_arrays that can
record separate events in separate buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:40 -04:00
Steven Rostedt
b000c8065a tracing: Remove the extra 4 bytes of padding in events
Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.

 (commit e6e1e2593 "tracing: Remove lock_depth from event entry")

Because this broke userspace, it was determined that we must
keep those 4 bytes around.

 (commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")

This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.

It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.

With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.

As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 21:05:41 -05:00
Steven Rostedt
0d5c6e1c19 tracing: Use irq_work for wake ups and remove *_nowake_*() functions
Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).

With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:52 -04:00
Steven Rostedt
15075cac42 tracing: Separate open function from set_event and available_events
The open function used by available_events is the same as set_event even
though it uses different seq functions. This causes a side effect of
writing into available_events clearing all events, even though
available_events is suppose to be read only.

There's no reason to keep a single function for just the open and have
both use different functions for everything else. It is a little
confusing and causes strange behavior. Just have each have their own
function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:49 -04:00
Steven Rostedt
81698831bc tracing: Enable comm recording if trace_printk() is used
If comm recording is not enabled when trace_printk() is used then
you just get this type of output:

[ adding trace_printk("hello! %d", irq); in do_IRQ ]

           <...>-2843  [001] d.h.    80.812300: do_IRQ: hello! 14
           <...>-2734  [002] d.h2    80.824664: do_IRQ: hello! 14
           <...>-2713  [003] d.h.    80.829971: do_IRQ: hello! 14
           <...>-2814  [000] d.h.    80.833026: do_IRQ: hello! 14

By enabling the comm recorder when trace_printk is enabled:

       hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
            sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
          <idle>-0     [001] d.h1   193.665996: do_IRQ: hello! 21

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:29 -04:00
Ezequiel Garcia
8781915ad2 trace: Move trace event enable from fs_initcall to core_initcall
This patch splits trace event initialization in two stages:
 * ftrace enable
 * sysfs event entry creation

This allows to capture trace events from an earlier point
by using 'trace_event' kernel parameter and is important
to trace boot-up allocations.

Note that, in order to enable events at core_initcall,
it's necessary to move init_ftrace_syscalls() from
core_initcall to early_initcall.

Link: http://lkml.kernel.org/r/1347461277-25302-1-git-send-email-elezegarcia@gmail.com

Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-24 14:13:02 -04:00
Yuanhan Liu
76bab1b78a tracing: Skip printing "OK" if failed to disable event
No acutal case found. But logically, we should skip "OK" in case any
error met.

Link: http://lkml.kernel.org/r/1346051625-25231-1-git-send-email-yuanhan.liu@linux.intel.com

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13 22:52:07 -04:00
Steven Rostedt
4740974a68 ftrace: Add default recursion protection for function tracing
As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.

If the flag is set, then the function will be called directly with no
extra protection.

Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-31 10:29:52 -04:00
Steven Rostedt
a1e2e31d17 ftrace: Return pt_regs to function trace callback
Return as the 4th paramater to the function tracer callback the pt_regs.

Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.

If an arch can pass full regs, then it should define:
 ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1

Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:18:49 -04:00
Steven Rostedt
2f5f6ad939 ftrace: Pass ftrace_ops as third parameter to function trace callback
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:17:35 -04:00
Steven Rostedt
9b63776fa3 tracing: Do not enable function event with enable
With the adding of function tracing event to perf, it caused a
side effect that produces the following warning when enabling all
events in ftrace:

 # echo 1 > /sys/kernel/debug/tracing/events/enable

[console]
event trace: Could not enable event function

This is because when enabling all events via the debugfs system
it ignores events that do not have a ->reg() function assigned.
This was to skip over the ftrace internal events (as they are
not TRACE_EVENTs). But as the ftrace function event now has
a ->reg() function attached to it for use with perf, it is no
longer ignored.

Worse yet, this ->reg() function is being called when it should
not be. It returns an error and causes the above warning to
be printed.

By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
and have all ftrace internel event structures have it set,
setting the events/enable will no longe try to incorrectly enable
the function event and does not warn.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-10 15:55:43 -04:00
Jiri Olsa
489c75c3b3 ftrace, perf: Add add/del tracepoint perf registration actions
Adding TRACE_REG_PERF_ADD and TRACE_REG_PERF_DEL to handle
perf event schedule in/out actions.

The add action is invoked for when the perf event is scheduled in,
while the del action is invoked when the event is scheduled out.

Link: http://lkml.kernel.org/r/1329317514-8131-4-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:25 -05:00
Jiri Olsa
ceec0b6fc7 ftrace, perf: Add open/close tracepoint perf registration actions
Adding TRACE_REG_PERF_OPEN and TRACE_REG_PERF_CLOSE to differentiate
register/unregister from open/close actions.

The register/unregister actions are invoked for the first/last
tracepoint user when opening/closing the event.

The open/close actions are invoked for each tracepoint user when
opening/closing the event.

Link: http://lkml.kernel.org/r/1329317514-8131-3-git-send-email-jolsa@redhat.com

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-02-21 11:08:24 -05:00
Ilya Dryomov
cb59974742 tracing: fix event_subsystem ref counting
Fix a bug introduced by e9dbfae5, which prevents event_subsystem from
ever being released.

Ref_count was added to keep track of subsystem users, not for counting
events.  Subsystem is created with ref_count = 1, so there is no need to
increment it for every event, we have nr_events for that.  Fix this by
touching ref_count only when we actually have a new user -
subsystem_open().

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Link: http://lkml.kernel.org/r/1320052062-7846-1-git-send-email-idryomov@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-12-05 13:28:44 -05:00
Ingo Molnar
492f73a303 Merge branch 'perf/urgent' into perf/core
Merge reason: pick up the latest fixes - they won't make v3.0.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-21 09:29:21 +02:00
Steven Rostedt
40ee4dffff tracing: Have "enable" file use refcounts like the "filter" file
The "enable" file for the event system can be removed when a module
is unloaded and the event system only has events from that module.
As the event system nr_events count goes to zero, it may be freed
if its ref_count is also set to zero.

Like the "filter" file, the "enable" file may be opened by a task and
referenced later, after a module has been unloaded and the events for
that event system have been removed.

Although the "filter" file referenced the event system structure,
the "enable" file only references a pointer to the event system
name. Since the name is freed when the event system is removed,
it is possible that an access to the "enable" file may reference
a freed pointer.

Update the "enable" file to use the subsystem_open() routine that
the "filter" file uses, to keep a reference to the event system
structure while the "enable" file is opened.

Cc: <stable@kernel.org>
Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-07 11:22:29 -04:00
Steven Rostedt
e9dbfae53e tracing: Fix bug when reading system filters on module removal
The event system is freed when its nr_events is set to zero. This happens
when a module created an event system and then later the module is
removed. Modules may share systems, so the system is allocated when
it is created and freed when the modules are unloaded and all the
events under the system are removed (nr_events set to zero).

The problem arises when a task opened the "filter" file for the
system. If the module is unloaded and it removed the last event for
that system, the system structure is freed. If the task that opened
the filter file accesses the "filter" file after the system has
been freed, the system will access an invalid pointer.

By adding a ref_count, and using it to keep track of what
is using the event system, we can free it after all users
are finished with the event system.

Cc: <stable@kernel.org>
Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-07 11:19:18 -04:00
Peter Huewe
22fe9b54d8 tracing: Convert to kstrtoul_from_user
This patch replaces the code for getting an unsigned long from a
userspace buffer by a simple call to kstroul_from_user.
This makes it easier to read and less error prone.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14 22:48:50 -04:00
Steven Rostedt
17bb615ad4 tracing: Have event with function tracer check error return
The self tests for event tracer does not check if the function
tracing was successfully activated. It needs to before it continues
the tests, otherwise the wrong errors may be reported.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-05-25 22:13:39 -04:00
Arjan van de Ven
a3a4a5acd3 Regression: partial revert "tracing: Remove lock_depth from event entry"
This partially reverts commit e6e1e25935.

That commit changed the structure layout of the trace structure, which
in turn broke PowerTOP (1.9x generation) quite badly.

I appreciate not wanting to expose the variable in question, and
PowerTOP was not using it, so I've replaced the variable with just a
padding field - that way if in the future a new field is needed it can
just use this padding field.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-06 13:20:59 -07:00
Yuanhan Liu
56355b83e2 tracing: Export trace_set_clr_event()
Trace events belonging to a module only exists when the module is
loaded. Well, we can use trace_set_clr_event funtion to enable some
trace event at the module init routine, so that we will not miss
something while loading then module.

So, Export the trace_set_clr_event function so that module can use it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:34:51 -05:00
Steven Rostedt
e6e1e25935 tracing: Remove lock_depth from event entry
The lock_depth field in the event headers was added as a temporary
data point for help in removing the BKL. Now that the BKL is pretty
much been removed, we can remove this field.

This in turn changes the header from 12 bytes to 8 bytes,
removing the 4 byte buffer that gcc would insert if the first field
in the data load was 8 bytes in size.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-03-10 10:31:48 -05:00
Steven Rostedt
e4a9ea5ee7 tracing: Replace trace_event struct array with pointer array
Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.

Suggested-by: David Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-02 21:37:13 -05:00
Steven Rostedt
0429578016 tracing/events: Show real number in array fields
Currently we have in something like the sched_switch event:

  field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;

When a userspace tool such as perf tries to parse this, the
TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
simply uses a #len to show the string of the length. When the length is
an enum, we get a string that means nothing for tools.

By adding a static buffer and a mutex to protect it, we can store the
string into that buffer with snprintf and show the actual number.
Now we get:

  field:char prev_comm[16];       offset:12;      size:16;        signed:1;

Something much more useful.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-11-19 10:18:47 -05:00
Linus Torvalds
092e0e7e52 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  vfs: make no_llseek the default
  vfs: don't use BKL in default_llseek
  llseek: automatically add .llseek fop
  libfs: use generic_file_llseek for simple_attr
  mac80211: disallow seeks in minstrel debug code
  lirc: make chardev nonseekable
  viotape: use noop_llseek
  raw: use explicit llseek file operations
  ibmasmfs: use generic_file_llseek
  spufs: use llseek in all file operations
  arm/omap: use generic_file_llseek in iommu_debug
  lkdtm: use generic_file_llseek in debugfs
  net/wireless: use generic_file_llseek in debugfs
  drm: use noop_llseek
2010-10-22 10:52:56 -07:00
Arnd Bergmann
6038f373a3 llseek: automatically add .llseek fop
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
2010-10-15 15:53:27 +02:00
Li Zefan
86397dc3cc tracing: Clean up seqfile code for format file
Remove the nasty hack that marks a pointer's LSB to distinguish common
fields from event fields. Replace it with a more sane approach.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4C6A23C2.9020606@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-18 11:09:14 -04:00
Steven Rostedt
d244b6bd41 Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into trace/tip/perf/urgent-4
Conflicts:
	kernel/trace/trace_events.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-16 11:17:30 -04:00
Steven Rostedt
2a37a3df57 tracing/events: Convert format output to seq_file
Two new events were added that broke the current format output.

Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout

The reason is that their print_fmt exceeded a page size. Since the output
of the format used simple_read_from_buffer and trace_seq, it was limited
to a page size in output.

This patch converts the printing of the format of an event into seq_file,
which allows greater than a page size to be shown.

I diffed all event formats comparing the output with and without this
patch. All matched except for the above two, which showed just:

  FORMAT TOO BIG

without this patch, but now properly displays the output with this patch.

v2: Remove updating *pos in seq start function.
   [ Thanks to Li Zefan for pointing that out ]

Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-08-12 16:59:29 -04:00
Li Zefan
e870e9a124 tracing: Allow to disable cmdline recording
We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
 -------------------------------------------------
 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
 ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
------ ------ ------ ------ ------ ------- -------
  2.19   2.3   2.21   2.56   2.13     2.54    2.07
  2.39   2.51  2.35   2.75   2.27     2.81    2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 21:52:33 -04:00
Steven Rostedt
a1d0ce8213 tracing: Use class->reg() for all registering of events
Because kprobes and syscalls need special processing to register
events, the class->reg() method was created to handle the differences.

But instead of creating a default ->reg for perf and ftrace events,
the code was scattered with:

	if (class->reg)
		class->reg();
	else
		default_reg();

This is messy and can also lead to bugs.

This patch cleans up this code and creates a default reg() entry for
the events allowing for the code to directly call the class->reg()
without the condition.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 21:13:14 -04:00
Li Zefan
67ead0a6ce tracing: Remove open-coded __trace_add_event_call()
Let trace_module_add_events() and event_trace_init() call
__trace_add_event_call().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA37E9.1020106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 17:12:55 -04:00
Li Zefan
c9d932cf8a tracing: Remove test of NULL define_fields callback
Every event (or event class) has it's define_fields callback,
so the test is redundant.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA37BC.8080707@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 17:12:52 -04:00
Li Zefan
8728fe501e tracing: Don't allocate common fields for every trace events
Every event has the same common fields, so it's a big waste of
memory to have a copy of those fields for every event.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA3759.30105@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-28 17:12:46 -04:00
Steven Rostedt
5168ae50a6 tracing: Remove ftrace_preempt_disable/enable
The ftrace_preempt_disable/enable functions were to address a
recursive race caused by the function tracer. The function tracer
traces all functions which makes it easily susceptible to recursion.
One area was preempt_enable(). This would call the scheduler and
the schedulre would call the function tracer and loop.
(So was it thought).

The ftrace_preempt_disable/enable was made to protect against recursion
inside the scheduler by storing the NEED_RESCHED flag. If it was
set before the ftrace_preempt_disable() it would not call schedule
on ftrace_preempt_enable(), thinking that if it was set before then
it would have already scheduled unless it was already in the scheduler.

This worked fine except in the case of SMP, where another task would set
the NEED_RESCHED flag for a task on another CPU, and then kick off an
IPI to trigger it. This could cause the NEED_RESCHED to be saved at
ftrace_preempt_disable() but the IPI to arrive in the the preempt
disabled section. The ftrace_preempt_enable() would not call the scheduler
because the flag was already set before entring the section.

This bug would cause a missed preemption check and cause lower latencies.

Investigating further, I found that the recusion caused by the function
tracer was not due to schedule(), but due to preempt_schedule(). Now
that preempt_schedule is completely annotated with notrace, the recusion
no longer is an issue.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-06-03 19:32:38 -04:00
Steven Rostedt
553552ce17 tracing: Combine event filter_active and enable into single flags field
The filter_active and enable both use an int (4 bytes each) to
set a single flag. We can save 4 bytes per event by combining the
two into a single integer.

   text	   data	    bss	    dec	    hex	filename
4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
4894944	1018052	 861512	6774508	 675eec	vmlinux.id
4894871	1012292	 861512	6768675	 674823	vmlinux.flags

This gives us another 5K in savings.

The modification of both the enable and filter fields are done
under the event_mutex, so it is still safe to combine the two.

Note: Although Mathieu gave his Acked-by, he would like it documented
 that the reads of flags are not protected by the mutex. The way the
 code works, these reads will not break anything, but will have a
 residual effect. Since this behavior is the same even before this
 patch, describing this situation is left to another patch, as this
 patch does not change the behavior, but just brought it to Mathieu's
 attention.

v2: Updated the event trace self test to for this change.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14 14:33:22 -04:00