Some of the sizing logic through tracer_alloc_buffers() uses int
internally, causing unexpected behavior if the user passes a value that
does not fit in an int (on my x86 machine, the result is uselessly tiny
buffers).
Fix by plumbing the parameter's real type (unsigned long) through to the
ring buffer allocation functions, which already use unsigned long.
It has always been possible to create larger ring buffers via the sysfs
interface: this only affects the cmdline parameter.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/bff42a4288aada08bdf74da3f5b67a2c28b761f8.1772852067.git.calvin@wbinvd.org
Fixes: 73c5162aa3 ("tracing: keep ring buffer to minimum size till used")
Signed-off-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
The latency tracers (scheduler, irqsoff, etc) were created when tracing
was first added. These tracers required a "snapshot" buffer that was the
same size as the ring buffer being written to. When a new max latency was
hit, the main ring buffer would swap with the snapshot buffer so that the
trace leading up to the latency would be saved in the snapshot buffer (The
snapshot buffer is never written to directly and the data within it can be
viewed without fear of being overwritten).
Later, a new feature was added to allow snapshots to be taken by user
space or even event triggers. This created a "snapshot" file that allowed
users to trigger a snapshot from user space to save the current trace.
The config for this new feature (CONFIG_TRACER_SNAPSHOT) would select the
latency tracer config (CONFIG_TRACER_MAX_LATENCY) as it would need all the
functionality from it as it already existed. But this was incorrect. As
the snapshot feature is really what the latency tracers need and not the
other way around.
Have CONFIG_TRACER_MAX_TRACE select CONFIG_TRACER_SNAPSHOT where the
tracers that needs the max latency buffer selects the TRACE_MAX_TRACE
which will then select TRACER_SNAPSHOT.
Also, go through trace.c and trace.h and make the code that only needs the
TRACER_MAX_TRACE protected by that and the code that always requires the
snapshot to be protected by TRACER_SNAPSHOT.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208183856.767870992@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When tracing was first added, there were latency tracers that would take a
snapshot of the current trace when a new max latency was hit. This
snapshot buffer was called "max_buffer". Since then, a snapshot feature
was added that allowed user space or event triggers to trigger a snapshot
of the current buffer using the same max_buffer of the trace_array.
As this snapshot buffer now has a more generic use case, calling it
"max_buffer" is confusing. Rename it to snapshot_buffer.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208183856.428446729@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The function trace_printk_init_buffers() is used to expand tha
trace_printk buffers when trace_printk() is used within the kernel or in
modules. On kernel boot up, it holds off from starting the sched switch
cmdline recorder, but will start it immediately when it is added by a
module.
Currently it uses a trick to see if the global_trace buffer has been
allocated or not to know if it was called by module load or not. But this
is more of a hack, and can not be used when this code is moved out of
trace.c. Instead simply look at the system_state and if it is running then
it is know that it could only be called by module load.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032450.660237094@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The trace.c file has become a dumping ground for all tracing code and has
become quite large. In order to move the trace_printk functions out of it
these functions can not access global_trace directly, as that is something
that needs to stay static in trace.c.
Instead of testing the trace_array tr pointer to &global_trace, test the
tr->flags to see if TRACE_ARRAY_FL_GLOBAL set.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032450.491116245@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The trace.c file has become a dumping ground for all tracing code and has
become quite large. In order to move the trace_printk functions out of it
these functions can not access global_trace directly, as that is something
that needs to stay static in trace.c.
Have tracing_update_buffers() take NULL for its trace_array to denote it
should work on the global_trace top level trace_array allows that function
to be used outside of trace.c and still update the global_trace
trace_array.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032450.318864210@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The file trace.c has become a catchall for most things tracing. Start
making it smaller by breaking out various aspects into their own files.
Make ftrace_trace_stack() into a static inline that tests if stack tracing
is enabled and if so to call __ftrace_trace_stack() to do the stack trace.
This keeps the test inlined in the fast paths and only does the function
call if stack tracing is enabled.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032449.974218132@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The file trace.c has become a catchall for most things tracing. Start
making it smaller by breaking out various aspects into their own files.
Move the __always_inline functions __trace_buffer_lock_reserve(),
__trace_buffer_unlock_commit() and trace_event_setup() into trace.h.
The trace.c file will be split up and these functions will be used in more
than one of these files. As they are already __always_inline they can
easily be moved into the trace.h header file.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032449.813550600@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The tracing_disabled variable is set to one on boot up to prevent some
parts of tracing to access the tracing infrastructure before it is set up.
It also can be set after boot if an anomaly is discovered.
It is currently a static variable in trace.c and can be accessed via a
function call trace_is_disabled(). There's really no reason to use a
function call as the tracing subsystem should be able to access it
directly.
By making the variable accessed directly, code can be moved out of trace.c
without adding overhead of a function call to see if tracing is disabled
or not.
Make tracing_disabled global and remove the tracing_is_disabled() helper
function. Also add some "unlikely()"s around tracing_disabled where it's
checked in hot paths.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260208032449.483690153@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In trace.c, the function trace_create_maxlat_file() is defined behind the
#ifdef CONFIG_TRACER_MAX_TRACE block. The #else part defines it as:
#define trace_create_maxlat_file(tr, d_tracer) \
trace_create_file("tracing_max_latency", TRACE_MODE_WRITE, \
d_tracer, tr, &tracing_max_lat_fops)
But the one place that it it used has:
#ifdef CONFIG_TRACER_MAX_TRACE
trace_create_maxlat_file(tr, d_tracer);
#endif
Which is pointless and also wrong!
It only gets created when both CONFIG_TRACE_MAX_TRACE and CONFIG_FS_NOTIFY
is defined, but the file itself should not be dependent on
CONFIG_FS_NOTIFY. Always create that file when TRACE_MAX_TRACE is defined
regardless if FS_NOTIFY is or is not.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260207191101.0e014abd@robin
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The eval_map_work_func() function, though queued in eval_map_wq,
holds the trace_event_sem read-write lock for a long time during
kernel boot. This causes blocking issues for other functions.
Rename eval_map_wq to trace_init_wq and make it global, thereby
allowing other parts of tracing to schedule work on this queue
asynchronously and avoiding blockage of the main boot thread.
Link: https://patch.msgid.link/20260204015344.162818-1-tianyaxiong@kylinos.cn
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When /proc/sys/kernel/traceoff_on_warning is set to 1, the top level
tracing buffer is disabled when a warning happens. This is very useful
when debugging and want the tracing buffer to stop taking new data when a
warning triggers keeping the events that lead up to the warning from being
overwritten.
Now that there is also a persistent ring buffer and an option to have
trace_printk go to that buffer, the same holds true for that buffer. A
warning could happen just before a crash but still write enough events to
lose the events that lead up to the first warning that was the reason for
the crash.
When /proc/sys/kernel/traceoff_on_warning is set to 1 and a warning is
triggered, not only disable the top level tracing buffer, but also disable
the buffer that trace_printk()s are written to.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://patch.msgid.link/20260121093858.5c5d7e7b@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The trace file will pause tracing if the tracing instance has the
"pause-on-trace" option is set. This happens when the file is opened, and
it is unpaused when the file is closed. When this was first added, there
was only one user that paused tracing. On open, the check to pause was:
if (!iter->snapshot && (tr->trace_flags & TRACE_ITER(PAUSE_ON_TRACE)))
Where if it is not the snapshot tracer and the "pause-on-trace" option is
set, then it increments a "stop_count" of the trace instance.
On close, the check is:
if (!iter->snapshot && tr->stop_count)
That is, if it is not the snapshot buffer and it was stopped, it will
re-enable tracing.
Now there's more places that stop tracing. This means, if something else
stops tracing the tr->stop_count will be non-zero, and that means if the
trace file is closed, it will decrement the stop_count even though it
never incremented it. This causes a warning because when the user that
stopped tracing enables it again, the stop_count goes below zero.
Instead of relying on the stop_count being set to know if the close of
the trace file should enable tracing again, add a new flag to the trace
iterator. The trace iterator is unique per open of the trace file, and if
the open stops tracing set the trace iterator PAUSE flag. On close, if the
PAUSE flag is set, then re-enable it again.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251202161751.24abaaf1@gandalf.local.home
Fixes: 06e0a548ba ("tracing: Do not disable tracing when reading the trace file")
Reported-by: syzbot+ccdec3bfe0beec58a38d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/692f44a5.a70a0220.2ea503.00c8.GAE@google.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull runtime verifier updates from Steven Rostedt:
- Adapt the ftracetest script to be run from a different folder
This uses the already existing OPT_TEST_DIR but extends it further to
run independent tests, then add an --rv flag to allow using the
script for testing RV (mostly) independently on ftrace.
- Add basic RV selftests in selftests/verification for more validations
Add more validations for available/enabled monitors and reactors.
This could have caught the bug introducing kernel panic solved above.
Tests use ftracetest.
- Convert react() function in reactor to use va_list directly
Use a central helper to handle the variadic arguments. Clean up
macros and mark functions as static.
- Add lockdep annotations to reactors to have lockdep complain of
errors
If the reactors are called from improper context. Useful to develop
new reactors. This highlights a warning in the panic reactor that is
related to the printk subsystem and not to RV.
- Convert core RV code to use lock guards and __free helpers
This completely removes goto statements.
- Fix compilation if !CONFIG_RV_REACTORS
Fix the warning by keeping LTL monitor variable as always static.
* tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix compilation if !CONFIG_RV_REACTORS
rv: Convert to use __free
rv: Convert to use lock guard
rv: Add explicit lockdep context for reactors
rv: Make rv_reacting_on() static
rv: Pass va_list to reactors
selftests/verification: Add initial RV tests
selftest/ftrace: Generalise ftracetest to use with RV
Pull tracing updates from Steven Rostedt:
- Extend tracing option mask to 64 bits
The trace options were defined by a 32 bit variable. This limits the
tracing instances to have a total of 32 different options. As that
limit has been hit, and more options are being added, increase the
option mask to a 64 bit number, doubling the number of options
available.
As this is required for the kprobe topic branches as well as the
tracing topic branch, a separate branch was created and merged into
both.
- Make trace_user_fault_read() available for the rest of tracing
The function trace_user_fault_read() is used by trace_marker file
read to allow reading user space to be done fast and without locking
or allocations. Make this available so that the system call trace
events can use it too.
- Have system call trace events read user space values
Now that the system call trace events callbacks are called in a
faultable context, take advantage of this and read the user space
buffers for various system calls. For example, show the path name of
the openat system call instead of just showing the pointer to that
path name in user space. Also show the contents of the buffer of the
write system call. Several system call trace events are updated to
make tracing into a light weight strace tool for all applications in
the system.
- Update perf system call tracing to do the same
- And a config and syscall_user_buf_size file to control the size of
the buffer
Limit the amount of data that can be read from user space. The
default size is 63 bytes but that can be expanded to 165 bytes.
- Allow the persistent ring buffer to print system calls normally
The persistent ring buffer prints trace events by their type and
ignores the print_fmt. This is because the print_fmt may change from
kernel to kernel. As the system call output is fixed by the system
call ABI itself, there's no reason to limit that. This makes reading
the system call events in the persistent ring buffer much nicer and
easier to understand.
- Add options to show text offset to function profiler
The function profiler that counts the number of times a function is
hit currently lists all functions by its name and offset. But this
becomes ambiguous when there are several functions with the same
name.
Add a tracing option that changes the output to be that of
'_text+offset' instead. Now a user space tool can use this
information to map the '_text+offset' to the unique function it is
counting.
- Report bad dynamic event command
If a bad command is passed to the dynamic_events file, report it
properly in the error log.
- Clean up tracer options
Clean up the tracer option code a bit, by removing some useless code
and also using switch statements instead of a series of if
statements.
- Have tracing options be instance specific
Tracers can have their own options (function tracer, irqsoff tracer,
function graph tracer, etc). But now that the same tracer can be
enabled in multiple trace instances, their options are still global.
The API is per instance, thus changing one affects other instances.
This isn't even consistent, as the option take affect differently
depending on when an tracer started in an instance. Make the options
for instances only affect the instance it is changed under.
- Optimize pid_list lock contention
Whenever the pid_list is read, it uses a spin lock. This happens at
every sched switch. Taking the lock at sched switch can be removed by
instead using a seqlock counter.
- Clean up the trace trigger structures
The trigger code uses two different structures to implement a single
tigger. This was due to trying to reuse code for the two different
types of triggers (always on trigger, and count limited trigger). But
by adding a single field to one structure, the other structure could
be absorbed into the first structure making he code easier to
understand.
- Create a bulk garbage collector for trace triggers
If user space has triggers for several hundreds of events and then
removes them, it can take several seconds to complete. This is
because each removal calls tracepoint_synchronize_unregister() that
can take hundreds of milliseconds to complete.
Instead, create a helper thread that will do the clean up. When a
trigger is removed, it will create the kthread if it isn't already
created, and then add the trigger to a llist. The kthread will take
the items off the llist, call tracepoint_synchronize_unregister(),
and then remove the items it took off. It will then check if there's
more items to free before sleeping.
This makes user space removing all these triggers to finish in less
than a second.
- Allow function tracing of some of the tracing infrastructure code
Because the tracing code can cause recursion issues if it is traced
by the function tracer the entire tracing directory disables function
tracing. But not all of tracing causes issues if it is traced.
Namely, the event tracing code. Add a config that enables some of the
tracing code to be traced to help in debugging it. Note, when this is
enabled, it does add noise to general function tracing, especially if
events are enabled as well (which is a common case).
- Add boot-time backup instance for persistent buffer
The persistent ring buffer is used mostly for kernel crash analysis
in the field. One issue is that if there's a crash, the data in the
persistent ring buffer must be read before tracing can begin using
it. This slows down the boot process. Once tracing starts in the
persistent ring buffer, the old data must be freed and the addresses
no longer match and old events can't be in the buffer with new
events.
Create a way to create a backup buffer that copies the persistent
ring buffer at boot up. Then after a crash, the always on tracer can
begin immediately as well as the normal boot process while the crash
analysis tooling uses the backup buffer. After the backup buffer is
finished being read, it can be removed.
- Enable function graph args and return address options at the same
time
Currently the when reading of arguments in the function graph tracer
is enabled, the option to record the parent function in the entry
event can not be enabled. Update the code so that it can.
- Add new struct_offset() helper macro
Add a new macro that takes a pointer to a structure and a name of one
of its members and it will return the offset of that member. This
allows the ring buffer code to simplify the following:
From: size = struct_size(entry, buf, cnt - sizeof(entry->id));
To: size = struct_offset(entry, id) + cnt;
There should be other simplifications that this macro can help out
with as well
* tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits)
overflow: Introduce struct_offset() to get offset of member
function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
tracing: Add boot-time backup of persistent ring buffer
ftrace: Allow tracing of some of the tracing code
tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
tracing: Add bulk garbage collection of freeing event_trigger_data
tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
tracing: Merge struct event_trigger_ops into struct event_command
tracing: Remove get_trigger_ops() and add count_func() from trigger ops
tracing: Show the tracer options in boot-time created instance
ftrace: Avoid redundant initialization in register_ftrace_direct
tracing: Remove unused variable in tracing_trace_options_show()
fgraph: Make fgraph_no_sleep_time signed
tracing: Convert function graph set_flags() to use a switch() statement
tracing: Have function graph tracer option sleep-time be per instance
tracing: Move graph-time out of function graph options
tracing: Have function graph tracer option funcgraph-irqs be per instance
trace/pid_list: optimize pid_list->lock contention
tracing: Have function graph tracer define options per instance
tracing: Have function tracer define options per instance
...
The trace_marker_raw file in tracefs takes a buffer from user space that
contains an id as well as a raw data string which is usually a binary
structure. The structure used has the following:
struct raw_data_entry {
struct trace_entry ent;
unsigned int id;
char buf[];
};
Since the passed in "cnt" variable is both the size of buf as well as the
size of id, the code to allocate the location on the ring buffer had:
size = struct_size(entry, buf, cnt - sizeof(entry->id));
Which is quite ugly and hard to understand. Instead, add a helper macro
called struct_offset() which then changes the above to a simple and easy
to understand:
size = struct_offset(entry, id) + cnt;
This will likely come in handy for other use cases too.
Link: https://lore.kernel.org/all/CAHk-=whYZVoEdfO1PmtbirPdBMTV9Nxt9f09CK0k6S+HJD3Zmg@mail.gmail.com/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://patch.msgid.link/20251126145249.05b1770a@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Currently, the persistent ring buffer instance needs to be read before
using it. This means we have to wait for boot up user space and dump
the persistent ring buffer. However, in that case we can not start
tracing on it from the kernel cmdline.
To solve this limitation, this adds an option which allows to create
a trace instance as a backup of the persistent ring buffer at boot.
If user specifies trace_instance=<BACKUP>=<PERSIST_RB> then the
<BACKUP> instance is made as a copy of the <PERSIST_RB> instance.
For example, the below kernel cmdline records all syscalls, scheduler
and interrupt events on the persistent ring buffer `boot_map` but
before starting the tracing, it makes a `backup` instance from the
`boot_map`. Thus, the `backup` instance has the previous boot events.
'reserve_mem=12M:4M:trace trace_instance=boot_map@trace,syscalls:*,sched:*,irq:* trace_instance=backup=boot_map'
As you can see, this just make a copy of entire reserved area and
make a backup instance on it. So you can release (or shrink) the
backup instance after use it to save the memory usage.
/sys/kernel/tracing/instances # free
total used free shared buff/cache available
Mem: 1999284 55704 1930520 10132 13060 1914628
Swap: 0 0 0
/sys/kernel/tracing/instances # rmdir backup/
/sys/kernel/tracing/instances # free
total used free shared buff/cache available
Mem: 1999284 40640 1945584 10132 13060 1929692
Swap: 0 0 0
Note: since there is no reason to make a copy of empty buffer, this
backup only accepts a persistent ring buffer as the original instance.
Also, since this backup is based on vmalloc(), it does not support
user-space mmap().
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/176377150002.219692.9425536150438129267.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The option "graph-time" affects the function profiler when it is using the
function graph infrastructure. It has nothing to do with the function
graph tracer itself. The option only affects the global function profiler
and does nothing to the function graph tracer.
Move it out of the function graph tracer options and make it a global
option that is only available at the top level instance.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251114192318.781711154@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tracers can add specify options to modify them. This logic was added
before instances were created and the tracer flags were global variables.
After instances were created where a tracer may exist in more than one
instance, the flags were not updated from being global into instance
specific. This causes confusion with these options. For example, the
function tracer has an option to enable function arguments:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo function > instances/foo/current_tracer
# echo 1 > options/func-args
# echo function > current_tracer
# cat trace
[..]
<idle>-0 [005] d..3. 1050.656187: rcu_needs_cpu() <-tick_nohz_next_event
<idle>-0 [005] d..3. 1050.656188: get_next_timer_interrupt(basej=0x10002dbad, basem=0xf45fd7d300) <-tick_nohz_next_event
<idle>-0 [005] d..3. 1050.656189: _raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
<idle>-0 [005] d..4. 1050.656190: do_raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
<idle>-0 [005] d..4. 1050.656191: _raw_spin_lock_nested(lock=0xffff8944bdf5f140, subclass=1) <-__get_next_timer_interrupt
# cat instances/foo/options/func-args
1
# cat instances/foo/trace
[..]
kworker/4:1-88 [004] ...1. 298.127735: next_zone <-refresh_cpu_vm_stats
kworker/4:1-88 [004] ...1. 298.127736: first_online_pgdat <-refresh_cpu_vm_stats
kworker/4:1-88 [004] ...1. 298.127738: next_online_pgdat <-refresh_cpu_vm_stats
kworker/4:1-88 [004] ...1. 298.127739: fold_diff <-refresh_cpu_vm_stats
kworker/4:1-88 [004] ...1. 298.127741: round_jiffies_relative <-vmstat_update
[..]
The above shows that setting "func-args" in the top level instance also
set it in the instance "foo", but since the interface of the trace flags
are per instance, the update didn't take affect in the "foo" instance.
Update the infrastructure to allow tracers to add a "default_flags" field
in the tracer structure that can be set instead of "flags" which will make
the flags per instance. If a tracer needs to keep the flags global (like
blktrace), keeping the "flags" field set will keep the old behavior.
This does not update function or the function graph tracers. That will be
handled later.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251111232429.305317942@kernel.org
Fixes: f20a580627 ("ftrace: Allow instances to use function tracing")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Updates to the function profiler adds new options to tracefs. The options
are currently defined by an enum as flags. The added options brings the
number of options over 32, which means they can no longer be held in a 32
bit enum. The TRACE_ITER_* flags are converted to a macro TRACE_ITER(*) to
allow the creation of options to still be done by macros.
This change is intrusive, as it affects all TRACE_ITER* options throughout
the trace code. Merge the branch that added these options and converted
the TRACE_ITER_* enum into a TRACE_ITER(*) macro, to allow the topic
branches to still be developed without conflict.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Function profiler shows the hit count of each function using its symbol
name. However, there are some same-name local symbols, which we can not
distinguish.
To solve this issue, this introduces an option to show the symbols
in "_text+OFFSET" format. This can avoid exposing the random shift of
KASLR. The functions in modules are shown as "MODNAME+OFFSET" where the
offset is from ".text".
E.g. for the kernel text symbols, specify vmlinux and the output to
addr2line, you can find the actual function and source info;
$ addr2line -fie vmlinux _text+3078208
__balance_callbacks
kernel/sched/core.c:5064
for modules, specify the module file and .text+OFFSET;
$ addr2line -fie samples/trace_events/trace-events-sample.ko .text+8224
do_simple_thread_func
samples/trace_events/trace-events-sample.c:23
Link: https://lore.kernel.org/all/176187878064.994619.8878296550240416558.stgit@devnote2/
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>