2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

474 Commits

Author SHA1 Message Date
Namhyung Kim
1ebb5e17ef perf annotate-data: Add get_global_var_type()
Accessing global variable is common when it tracks execution later.
Factor out the common code into a function for later use.

It adds thread and cpumode to struct data_loc_info to find (global)
symbols if needed.  Also remove var_name as it's retrieved in the
helper function.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:28 -03:00
Namhyung Kim
4f903455be perf annotate-data: Add update_insn_state()
The update_insn_state() function is to update the type state table after
processing each instruction.  For now, it handles MOV (on x86) insn
to transfer type info from the source location to the target.

The location can be a register or a stack slot.  Check carefully when
memory reference happens and fetch the type correctly.  It basically
ignores write to a memory since it doesn't change the type info.  One
exception is writes to (new) stack slots for register spilling.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:28 -03:00
Namhyung Kim
5cdd3fd799 perf annotate: Add annotate_get_basic_blocks()
The annotate_get_basic_blocks() is to find a list of basic blocks from
the source instruction to the destination instruction in a function.

It'll be used to find variables in a scope.  Use BFS (Breadth First
Search) to find a shortest path to carry the variable/register state
minimally.

Also change find_disasm_line() to be used in annotate_get_basic_blocks()
and add 'allow_update' argument to control if it can update the IP.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:28 -03:00
Namhyung Kim
a3f4d5b57d perf annotate-data: Introduce 'struct data_loc_info'
The find_data_type() needs many information to describe the location of
the data.  Add the new 'struct data_loc_info' to pass those information at
once.

No functional changes intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240319055115.4063940-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21 10:41:28 -03:00
Namhyung Kim
f59e3660cd perf annotate: Remove sym_hist.addr[] array
It's not used anymore and the code is coverted to use a hash map.  Now
sym_hist has a static size, so no need to have sizeof_sym_hist in the
struct annotated_source.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240304230815.1440583-4-namhyung@kernel.org
2024-03-06 20:25:36 -08:00
Namhyung Kim
8015457584 perf annotate: Calculate instruction overhead using hashmap
Use annotated_source.samples hashmap instead of addr array in the
struct sym_hist.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240304230815.1440583-3-namhyung@kernel.org
2024-03-06 20:25:20 -08:00
Namhyung Kim
d3e7cad6f3 perf annotate: Add a hashmap for symbol histogram
Now symbol histogram uses an array to save per-offset sample counts.
But it wastes a lot of memory if the symbol has a few samples only.
Add a hashmap to save values only for actual samples.

For now, it has duplicate histogram (one in the existing array and
another in the new hash map).  Once it can convert to use the hash
in all places, we can get rid of the array later.

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240304230815.1440583-2-namhyung@kernel.org
2024-03-06 20:24:55 -08:00
Namhyung Kim
5f7cdde843 perf annotate-data: Support global variables
Global variables are accessed using PC-relative address so it needs to
be handled separately.  The PC-rel addressing is detected by using
DWARF_REG_PC.  On x86, %rip register would be used.

The address can be calculated using the ip and offset in the
instruction.  But it should start from the next instruction so add
calculate_pcrel_addr() to do it properly.

But global variables defined in a different file would only have a
declaration which doesn't include a location list.  So it first tries
to get the type info using the address, and then looks up the variable
declarations using name.  The name of global variables should be get
from the symbol table.  The declaration would have the type info.

So extend find_var_type() to take both address and name for global
variables.

The stat is now looks like:

  Annotate data type stats:
  total 294, ok 153 (52.0%), bad 141 (48.0%)
  -----------------------------------------------------------
          30 : no_sym
          32 : no_mem_ops
          61 : no_var
          10 : no_typeinfo
           8 : bad_offset

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22 12:08:20 -08:00
Namhyung Kim
7a54f1d83d perf annotate-data: Add stack operation pseudo type
A typical function prologue and epilogue include multiple stack
operations to save and restore the current value of registers.
On x86, it looks like below:

  push  r15
  push  r14
  push  r13
  push  r12

  ...

  pop   r12
  pop   r13
  pop   r14
  pop   r15
  ret

As these all touches the stack memory region, chances are high that they
appear in a memory profile data.  But these are not used for any real
purpose yet so it'd return no types.

One of my profile type shows that non neglible portion of data came from
the stack operations.  It also seems GCC generates more stack operations
than clang.

Annotate Instruction stats
total 264, ok 169 (64.0%), bad 95 (36.0%)

    Name      :  Good   Bad
  -----------------------------------------------------------
    movq      :    49    27
    movl      :    24     9
    popq      :     0    19   <-- here
    cmpl      :    17     2
    addq      :    14     1
    cmpq      :    12     2
    cmpxchgl  :     3     7

Instead of dealing them as unknown, let's create a seperate pseudo type
to represent those stack operations separately.

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22 12:08:20 -08:00
Namhyung Kim
d3030191d3 perf annotate-data: Handle array style accesses
On x86, instructions for array access often looks like below.

  mov  0x1234(%rax,%rbx,8), %rcx

Usually the first register holds the type information and the second one
has the index.  And the current code only looks up a variable for the
first register.  But it's possible to be in the other way around so it
needs to check the second register if the first one failed.

The stat changed like this.

  Annotate data type stats:
  total 294, ok 148 (50.3%), bad 146 (49.7%)
  -----------------------------------------------------------
          30 : no_sym
          32 : no_mem_ops
          66 : no_var
          10 : no_typeinfo
           8 : bad_offset

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22 12:08:19 -08:00
Namhyung Kim
1cf4df0373 perf annotate-data: Handle macro fusion on x86
When a sample was come from a conditional branch without a memory
operand, it could be due to a macro fusion with a previous instruction.
So it needs to check the memory operand in the previous one.

This improves the stat like below:

  Annotate data type stats:
  total 294, ok 147 (50.0%), bad 147 (50.0%)
  -----------------------------------------------------------
          30 : no_sym
          32 : no_mem_ops
          71 : no_var
           6 : no_typeinfo
           8 : bad_offset

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22 12:08:19 -08:00
Namhyung Kim
a3397d69e4 perf annotate-data: Parse 'lock' prefix from llvm-objdump
For the performance reason, I prefer llvm-objdump over GNU's.  But I
found that llvm-objdump puts x86 lock prefix in a separate line like
below.

  ffffffff81000695: f0                    lock
  ffffffff81000696: ff 83 54 0b 00 00     incl    2900(%rbx)

This should be parsed properly, but I just changed to find the insn
with next offset for now.

This improves the statistics as it can process more instructions.

  Annotate data type stats:
  total 294, ok 144 (49.0%), bad 150 (51.0%)
  -----------------------------------------------------------
          30 : no_sym
          35 : no_mem_ops
          71 : no_var
           6 : no_typeinfo
           8 : bad_offset

Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20240117062657.985479-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22 12:08:19 -08:00
Namhyung Kim
58824fa008 perf annotate: Add --insn-stat option for debugging
This is for a debugging purpose.  It'd be useful to see per-instrucion
level success/failure stats.

  $ perf annotate --data-type --insn-stat
  Annotate Instruction stats
  total 264, ok 143 (54.2%), bad 121 (45.8%)

    Name      :  Good   Bad
  -----------------------------------------------------------
    movq      :    45    31
    movl      :    22    11
    popq      :     0    19
    cmpl      :    16     3
    addq      :     8     7
    cmpq      :    11     3
    cmpxchgl  :     3     7
    cmpxchgq  :     8     0
    incl      :     3     3
    movzbl    :     4     2
    incq      :     4     2
    decl      :     6     0
    ...

Committer notes:

So these are about being able to find the type for accesses from these
instructions, we should improve the naming, but it is for debugging, we
can improve this later:

  @@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he)
                          continue;

                  mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset);
  +               if (mem_type)
  +                       istat->good++;
  +               else
  +                       istat->bad++;

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:40:17 -03:00
Namhyung Kim
61a9741e9f perf annotate: Add --type-stat option for debugging
The --type-stat option is to be used with --data-type and to print
detailed failure reasons for the data type annotation.

  $ perf annotate --data-type --type-stat
  Annotate data type stats:
  total 294, ok 116 (39.5%), bad 178 (60.5%)
  -----------------------------------------------------------
          30 : no_sym
          40 : no_insn_ops
          33 : no_mem_ops
          63 : no_var
           4 : no_typeinfo
           8 : bad_offset

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:40:13 -03:00
Namhyung Kim
263925bf84 perf annotate: Add --data-type option
Support data type annotation with new --data-type option.  It internally
uses type sort key to collect sample histogram for the type and display
every members like below.

  $ perf annotate --data-type
  ...
  Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples):
  ============================================================================
      samples     offset       size  field
           13          0        640  struct cfs_rq         {
            2          0         16      struct load_weight       load {
            2          0          8          unsigned long        weight;
            0          8          4          u32  inv_weight;
                                         };
            0         16          8      unsigned long    runnable_weight;
            0         24          4      unsigned int     nr_running;
            1         28          4      unsigned int     h_nr_running;
  ...

For simplicity it prints the number of samples per field for now.
But it should be easy to show the overhead percentage instead.

The number at the outer struct is a sum of the numbers of the inner
members.  For example, struct cfs_rq got total 13 samples, and 2 came
from the load (struct load_weight) and 1 from h_nr_running.  Similarly,
the struct load_weight got total 2 samples and they all came from the
weight field.

I've added two new flags in the symbol_conf for this.  The
annotate_data_member is to get the members of the type.  This is also
needed for perf report with typeoff sort key.  The annotate_data_sample
is to update sample stats for each offset and used only in annotate.

Currently it only support stdio output mode, TUI support can be added
later.

Committer testing:

With the perf.data from the previous csets, a very simple, short
duration one:

  # perf annotate --data-type
  Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples):
  ============================================================================
      samples     offset       size  field
            1          0         16  struct list_head      {
            0          0          8      struct list_head*        next;
            1          8          8      struct list_head*        prev;
                                     };

  Annotate type: 'char' in [kernel.kallsyms] (1 samples):
  ============================================================================
      samples     offset       size  field
            1          0          1  char ;

  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:43 -03:00
Namhyung Kim
871304a79f perf report: Add 'typeoff' sort key
The typeoff sort key shows the data type name, offset and the name of
the field.  This is useful to see which field in the struct is accessed
most frequently.

  $ perf report -s type,typeoff --hierarchy --stdio
  ...
  #     Overhead  Data Type / Data Type Offset
  # ............  ............................
  #
  ...
        1.23%     struct cfs_rq
           0.19%    struct cfs_rq +404 (throttle_count)
           0.19%    struct cfs_rq +0 (load.weight)
           0.19%    struct cfs_rq +336 (leaf_cfs_rq_list.next)
           0.09%    struct cfs_rq +272 (propagate)
           0.09%    struct cfs_rq +196 (removed.nr)
           0.09%    struct cfs_rq +80 (curr)
           0.09%    struct cfs_rq +544 (lt_b_children_throttled)
           0.06%    struct cfs_rq +320 (rq)

Committer testing:

Again with the perf.data from the previous csets:

  # perf report --stdio -s type,typeoff
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Data Type  Data Type Offset
  # ........  .........  ................
  #
      42.86%  struct list_head  struct list_head +8 (prev)
      42.86%  (unknown)  (unknown) +0 (no field)
      14.29%  char       char +0 (no field)

  #
  # (Tip: To see callchains in a more compact form: perf report -g folded)
  #
  # perf report --stdio -s dso,type,typeoff
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 4  of event 'cpu_atom/mem-loads,ldlat=30/P'
  # Event count (approx.): 7
  #
  # Overhead  Shared Object         Data Type  Data Type Offset
  # ........  ....................  .........  ................
  #
      42.86%  [kernel.kallsyms]     struct list_head  struct list_head +8 (prev)
      28.57%  libc.so.6             (unknown)  (unknown) +0 (no field)
      14.29%  [kernel.kallsyms]     char       char +0 (no field)
      14.29%  ld-linux-x86-64.so.2  (unknown)  (unknown) +0 (no field)

  #
  # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline)
  #
  #

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:42 -03:00
Namhyung Kim
9bd7ddd157 perf annotate-data: Update sample histogram for type
The annotated_data_type__update_samples() to get histogram for data type
access.

It'll be called by perf annotate to show which fields in the data type
are accessed frequently.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:42 -03:00
Namhyung Kim
67bc54bbc5 perf annotate: Implement hist_entry__get_data_type()
It's the function to find out the type info from the given sample data
and will be called from the hist_entry sort logic when 'type' sort key
is used.

It first calls objdump to disassemble the instructions and figure out
information about memory access at the location.  Maybe we can do it
better by analyzing the instruction directly, but I'll leave it for
later work.

The memory access is determined by checking instruction operands to
have "(" and then extract register name and offset.  It'll return NULL
if no data type is found.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:42 -03:00
Namhyung Kim
3a0c26edc3 perf annotate: Add annotate_get_insn_location()
The annotate_get_insn_location() is to get the detailed information of
instruction locations like registers and offset.  It has source and
target operands locations in an array.  Each operand can have a register
and an offset.  The offset is meaningful when mem_ref flag is set.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:42 -03:00
Namhyung Kim
0669729eb0 perf annotate: Factor out evsel__get_arch()
The evsel__get_arch() is to get architecture info from the environment.

It'll be used by other places later so let's factor it out.

Also add arch__is() to check the arch info by name.

Committer notes:

"get" is usually associated with refcounting, so we better rename this
at some point to a better name.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231213001323.718046-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23 22:39:42 -03:00
Namhyung Kim
327f7533cc perf annotate: Get rid of local annotation options
It doesn't need the option in the struct annotation which is allocated
for each symbol.  It can directly use the global options and save 8
bytes per symbol.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07 17:18:50 -03:00
Namhyung Kim
2fa21d694c perf annotate: Remove remaining usages of local annotation options
So that it can get rid of the unused data.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07 17:18:37 -03:00
Namhyung Kim
7f929aea21 perf annotate: Ensure init/exit for global options
Now it only cares about the global options so it can just handle it
without the argument.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07 17:18:24 -03:00
Namhyung Kim
41fd3cacd2 perf annotate: Use global annotation_options
Now it can directly use the global options and no need to pass it as an
argument.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-5-namhyung@kernel.org
[ Fixup build with GTK2=1 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07 17:17:57 -03:00
Namhyung Kim
9d03194a36 perf annotate: Introduce global annotation_options
The annotation options are to control the behavior of objdump and the
output.  It's basically used by 'perf annotate' but 'perf report' and
'perf top' can call it on TUI dynamically.

But it doesn't need to have a copy of annotation options in many places.

As most of the work is done in the util/annotate.c file, add a global
variable and set/use it instead of having their own copies.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231128175441.721579-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07 16:45:54 -03:00
Namhyung Kim
08973307d2 perf annotate: Check if operand has multiple regs
It needs to check all possible information in an instruction.  Let's add
a field indicating if the operand has multiple registers.  I'll be used
to search type information like in an array access on x86 like:

  mov    0x10(%rax,%rbx,8), %rcx
             -------------
                 here

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231012035111.676789-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-27 16:05:19 -03:00
Namhyung Kim
fb7fd2a14a perf annotate: Move raw_comment and raw_func_start fields out of 'struct ins_operands'
Thoese two fields are used only for the jump_ops, so move them into the
union to save some bytes.  Also add jump__delete() callback not to free
the fields as they didn't allocate new strings.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: WANG Rui <wangrui@loongson.cn>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-10 09:03:48 -03:00
Namhyung Kim
ded8c48497 perf annotate: Pass "-l" option to objdump conditionally
The "-l" option is to print line numbers in the objdump output.  perf
annotate TUI only can show the line numbers later but it causes big slow
downs for the kernel binary.

Similarly, showing source code also takes a long time and it already has
an option to control it.

  $ time objdump ... -d -S -C vmlinux > /dev/null
  real	0m3.474s
  user	0m3.047s
  sys	0m0.428s

  $ time objdump ... -d -l -C vmlinux > /dev/null
  real	0m1.796s
  user	0m1.459s
  sys	0m0.338s

  $ time objdump ... -d -C vmlinux > /dev/null
  real	0m0.051s
  user	0m0.036s
  sys	0m0.016s

As it's not needed for data type profiling, let's make it conditional so
that it can skip the unnecessary work.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: linux-toolchains@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-10 09:03:23 -03:00
Namhyung Kim
b753d48f53 perf annotate: Move offsets array from 'struct annotation' to 'struct annotated_source'
The offsets array keeps pointers to 'struct annotation_line' entries
which are available in the 'struct annotated_source'.  Let's move it to
there.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09 13:49:33 -03:00
Namhyung Kim
0aae4c99c5 perf annotate: Move some source code related fields from 'struct annotation' to 'struct annotated_source'
Some fields in the 'struct annotation' are only used with 'struct
annotated_source' so better to be moved there in order to reduce memory
consumption for other symbols.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09 13:49:33 -03:00
Namhyung Kim
2b215ec71b perf annotate: Move max_coverage from 'struct annotation' to 'struct annotated_branch'
The max_coverage field is only used when branch stack info is available
so it'd be natural to move to 'struct annotated_branch'.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09 13:49:33 -03:00
Namhyung Kim
b7f87e3259 perf annotate: Split branch stack cycles info from 'struct annotation'
The cycles info is only meaningful when sample has branch stacks.  To
save the memory for normal cases, move those fields to a new 'struct
annotated_branch' and dynamically allocate it when needed.  Also move
cycles_hist from annotated_source as it's related here.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09 13:49:33 -03:00
Namhyung Kim
de2c7eb59c perf annotate: Split branch stack cycles information out of 'struct annotation_line'
The cycles info is used only when branch stack is provided.  Separate
them from 'struct annotation_line' into a separate struct and lazy
allocate them to save some memory.

Committer notes:

Make annotation__compute_ipc() check if the lazy allocation works,
bailing out if so, its callers already do error checking and
propagation.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231103191907.54531-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09 13:49:33 -03:00
Arnaldo Carvalho de Melo
979e9c9fc9 perf annotate bpf: Don't enclose non-debug code with an assert()
In 616b14b47a ("perf build: Conditionally define NDEBUG") we
started using NDEBUG=1 when DEBUG=1 isn't present, so code that is
enclosed with assert() is not called.

In dd317df072 ("perf build: Make binutil libraries opt in") we
stopped linking against binutils-devel, for licensing reasons.

Recently people asked me why annotation of BPF programs wasn't working,
i.e. this:

  $ perf annotate bpf_prog_5280546344e3f45c_kfree_skb

was returning:

  case SYMBOL_ANNOTATE_ERRNO__NO_LIBOPCODES_FOR_BPF:
     scnprintf(buf, buflen, "Please link with binutils's libopcode to enable BPF annotation");

This was on a fedora rpm, so its new enough that I had to try to test by
rebuilding using BUILD_NONDISTRO=1, only to get it segfaulting on me.

This combination made this libopcode function not to be called:

        assert(bfd_check_format(bfdf, bfd_object));

Changing it to:

	if (!bfd_check_format(bfdf, bfd_object))
		abort();

Made it work, looking at this "check" function made me realize it
changes the 'bfdf' internal state, i.e. we better call it.

So stop using assert() on it, just call it and abort if it fails.

Probably it is better to propagate the error, etc, but it seems it is
unlikely to fail from the usage done so far and we really need to stop
using libopcodes, so do the quick fix above and move on.

With it we have BPF annotation back working when built with
BUILD_NONDISTRO=1:

  ⬢[acme@toolbox perf-tools-next]$ perf annotate --stdio2 bpf_prog_5280546344e3f45c_kfree_skb   | head
  No kallsyms or vmlinux with build-id 939bc71a1a51cdc434e60af93c7e734f7d5c0e7e was found
  Samples: 12  of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 3000000, [percent: local period]
  bpf_prog_5280546344e3f45c_kfree_skb() bpf_prog_5280546344e3f45c_kfree_skb
  Percent      int kfree_skb(struct trace_event_raw_kfree_skb *args) {
                 nop
   33.33         xchg   %ax,%ax
                 push   %rbp
                 mov    %rsp,%rbp
                 sub    $0x180,%rsp
                 push   %rbx
                 push   %r13
  ⬢[acme@toolbox perf-tools-next]$

Fixes: 6987561c9e ("perf annotate: Enable annotation of BPF programs")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mohamed Mahmoud <mmahmoud@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Dave Tucker <datucker@redhat.com>
Cc: Derek Barbosa <debarbos@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/lkml/ZMrMzoQBe0yqMek1@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-03 17:01:28 -03:00
WANG Rui
4ca0d340ce perf annotate: Fix instruction association and parsing for LoongArch
In the perf annotate view for LoongArch, there is no arrowed line
pointing to the target from the branch instruction. This issue is
caused by incorrect instruction association and parsing.

$ perf record alloc-6276705c94ad1398 # rust benchmark
$ perf report

  0.28 │       ori        $a1, $zero, 0x63
       │       move       $a2, $zero
 10.55 │       addi.d     $a3, $a2, 1(0x1)
       │       sltu       $a4, $a3, $s7
  9.53 │       masknez    $a4, $s7, $a4
       │       sub.d      $a3, $a3, $a4
 12.12 │       st.d       $a1, $fp, 24(0x18)
       │       st.d       $a3, $fp, 16(0x10)
 16.29 │       slli.d     $a2, $a2, 0x2
       │       ldx.w      $a2, $s8, $a2
 12.77 │       st.w       $a2, $sp, 724(0x2d4)
       │       st.w       $s0, $sp, 720(0x2d0)
  7.03 │       addi.d     $a2, $sp, 720(0x2d0)
       │       addi.d     $a1, $a1, -1(0xfff)
 12.03 │       move       $a2, $a3
       │     → bne        $a1, $s3, -52(0x3ffcc)  # 82ce8 <test::bench::Bencher::iter+0x3f4>
  2.50 │       addi.d     $a0, $a0, 1(0x1)

This patch fixes instruction association issues, such as associating
branch instructions with jump_ops instead of call_ops, and corrects
false instruction matches. It also implements branch instruction parsing
specifically for LoongArch. With this patch, we will be able to see the
arrowed line.

  0.79 │3ec:   ori        $a1, $zero, 0x63
       │       move       $a2, $zero
 10.32 │3f4:┌─→addi.d     $a3, $a2, 1(0x1)
       │    │  sltu       $a4, $a3, $s7
 10.44 │    │  masknez    $a4, $s7, $a4
       │    │  sub.d      $a3, $a3, $a4
 14.17 │    │  st.d       $a1, $fp, 24(0x18)
       │    │  st.d       $a3, $fp, 16(0x10)
 13.15 │    │  slli.d     $a2, $a2, 0x2
       │    │  ldx.w      $a2, $s8, $a2
 11.00 │    │  st.w       $a2, $sp, 724(0x2d4)
       │    │  st.w       $s0, $sp, 720(0x2d0)
  8.00 │    │  addi.d     $a2, $sp, 720(0x2d0)
       │    │  addi.d     $a1, $a1, -1(0xfff)
 11.99 │    │  move       $a2, $a3
       │    └──bne        $a1, $s3, 3f4
  3.17 │       addi.d     $a0, $a0, 1(0x1)

Signed-off-by: WANG Rui <wangrui@loongson.cn>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: loongarch@lists.linux.dev
Cc: loongson-kernel@lists.loongnix.cn
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-20 21:55:06 -07:00
Ian Rogers
2e9f9d4a72 perf annotation: Switch lock from a mutex to a sharded_mutex
Remove the "struct mutex lock" variable from annotation that is
allocated per symbol. This removes in the region of 40 bytes per
symbol allocation. Use a sharded mutex where the number of shards is
set to the number of CPUs. Assuming good hashing of the annotation
(done based on the pointer), this means in order to contend there
needs to be more threads than CPUs, which is not currently true in any
perf command. Were contention an issue it is straightforward to
increase the number of shards in the mutex.

On my Debian/glibc based machine, this reduces the size of struct
annotation from 136 bytes to 96 bytes, or nearly 30%.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andres Freund <andres@anarazel.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yuan Can <yuancan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-20 17:04:21 -07:00
Vincent Whitchurch
66dc1920f6 perf annotate: Work with vmlinux outside symfs
It is currently possible to use --symfs along with a vmlinux which lies
outside of the symfs by passing an absolute path to --vmlinux, thanks to
the check in dso__load_vmlinux() which handles this explicitly.

However, the annotate code lacks this check and thus 'perf annotate'
does not work ("Internal error: Invalid -1 error code") for kernel
functions with this combination.  Add the missing handling.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel@axis.com
Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-16 10:04:28 -03:00
Namhyung Kim
103b3d2f94 perf annotate: Allow whitespace between insn operands
The llvm-objdump adds a space between the operands while GNU objdump
does not.  Allow a space to handle the both.

In GNU objdump:

  Disassembly of section .text:                                      here
                                                                      |
  ffffffff81000000 <_stext>:                                          v
  ffffffff81000000:	48 8d 25 51 1f 40 01 	lea    0x1401f51(%rip),%rsp
  ffffffff81000007:	e8 d4 00 00 00       	call   ffffffff810000e0 <verify_cpu>
  ffffffff8100000c:	48 8d 3d ed ff ff ff 	lea    -0x13(%rip),%rdi

In llvm-objdump:

  Disassembly of section .text:                                      here
                                                                       |
  ffffffff81000000 <startup_64>:                                       v
  ffffffff81000000: 48 8d 25 51 1f 40 01 	leaq	20979537(%rip), %rsp
  ffffffff81000007: e8 d4 00 00 00       	callq	0xffffffff810000e0 <verify_cpu>
  ffffffff8100000c: 48 8d 3d ed ff ff ff 	leaq	-19(%rip), %rdi

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230612230026.3887586-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12 21:53:18 -03:00
Ian Rogers
625db36e6c perf srcline: Change free_srcline to zfree_srcline
Make use after free more unlikely.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Ye Xingchen <ye.xingchen@zte.com.cn>
Cc: Yuan Can <yuancan@huawei.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230608232823.4027869-26-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12 15:57:54 -03:00
Ian Rogers
d3d53b2e96 perf annotate: Fix parse_objdump_line memory leak
fileloc is used to hold a previous line, before overwriting it ensure
the previous contents is freed. Free the storage once done in
symbol__disassemble.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Ye Xingchen <ye.xingchen@zte.com.cn>
Cc: Yuan Can <yuancan@huawei.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230608232823.4027869-22-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12 15:57:54 -03:00
Namhyung Kim
d0b3597998 perf annotate: Handle x86 instruction suffix generally
In AT&T asm syntax, most of x86 instructions can have size suffix like
b, w, l or q.  Instead of adding all these instructions in the table,
we can handle them in a general way.

For example, it can try to find an instruction as is.  If not found,
assuming it has a suffix and it'd try again without the suffix if it's
one of the allowed suffixes.  This way, we can reduce the instruction
table size for duplicated entries of the same instructions with a
different suffix.

If an instruction xyz and others like xyz<suffix> are completely
different ones, then they both need to be listed in the table so that
they can be found before the second attempt (without the suffix).

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230524205054.3087004-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-09 10:56:05 -03:00
Namhyung Kim
94f0705eee perf annotate: Parse x86 SIB addressing properly
When the source argument of the "mov" instruction looks like below, it
didn't parse the whole operand and just stopped at the first comma.

  mov    (%rbx,%rax,1),%rcx

Fix it by checking the parentheses and move it to the closing one.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230511062725.514752-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-15 17:51:27 -03:00
Linus Torvalds
f085df1be6 Disable building BPF based features by default for v6.4.
We need to better polish building with BPF skels, so revert back to
 making it an experimental feature that has to be explicitely enabled
 using BUILD_BPF_SKEL=1.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
 J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
 DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
 =7G78
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
 "Third version of perf tool updates, with the build problems with with
  using a 'vmlinux.h' generated from the main build fixed, and the bpf
  skeleton build disabled by default.

  Build:

   - Require libtraceevent to build, one can disable it using
     NO_LIBTRACEEVENT=1.

     It is required for tools like 'perf sched', 'perf kvm', 'perf
     trace', etc.

     libtraceevent is available in most distros so installing
     'libtraceevent-devel' should be a one-time event to continue
     building perf as usual.

     Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
     sufficient for lots of users not interested in those libtraceevent
     dependent features.

   - Allow Python support in 'perf script' when libtraceevent isn't
     linked, as not all features requires it, for instance Intel PT does
     not use tracepoints.

   - Error if the python interpreter needed for jevents to work isn't
     available and NO_JEVENTS=1 isn't set, preventing a build without
     support for JSON vendor events, which is a rare but possible
     condition. The two check error messages:

        $(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
        $(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

   - Make libbpf 1.0 the minimum required when building with out of
     tree, distro provided libbpf.

   - Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
     demangler, add 'perf test' entry for it.

   - Make binutils libraries opt in, as distros disable building with it
     due to licensing, they were used for C++ demangling, for instance.

   - Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
     equivalent) isn't installed, we'll just have a build warning:

       Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

   - Add a feature test for scandirat(), that is not implemented so far
     in musl and uclibc, disabling features that need it, such as
     scanning for tracepoints in /sys/kernel/tracing/events.

  perf BPF filters:

   - New feature where BPF can be used to filter samples, for instance:

      $ sudo ./perf record -e cycles --filter 'period > 1000' true
      $ sudo ./perf script
           perf-exec 2273949 546850.708501:       5029 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708508:      32409 cycles:  ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
           perf-exec 2273949 546850.708526:     143369 cycles:  ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
           perf-exec 2273949 546850.708600:     372650 cycles:  ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
           perf-exec 2273949 546850.708791:     482953 cycles:  ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
                true 2273949 546850.709036:     501985 cycles:  ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
                true 2273949 546850.709292:     503065 cycles:      7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

   - In addition to 'period' (PERF_SAMPLE_PERIOD), the other
     PERF_SAMPLE_ can be used for filtering, and also some other sample
     accessible values, from tools/perf/Documentation/perf-record.txt:

        Essentially the BPF filter expression is:

        <term> <operator> <value> (("," | "||") <term> <operator> <value>)*

     The <term> can be one of:
        ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
        code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
        p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
        mem_dtlb, mem_blk, mem_hops

     The <operator> can be one of:
        ==, !=, >, >=, <, <=, &

     The <value> can be one of:
        <number> (for any term)
        na, load, store, pfetch, exec (for mem_op)
        l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
        na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
        remote (for mem_remote)
        na, locked (for mem_locked)
        na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
        na, by_data, by_addr (for mem_blk)
        hops0, hops1, hops2, hops3 (for mem_hops)

  perf lock contention:

   - Show lock type with address.

   - Track and show mmap_lock, siglock and per-cpu rq_lock with address.
     This is done for mmap_lock by following the current->mm pointer:

      $ sudo ./perf lock con -abl -- sleep 10
       contended   total wait     max wait     avg wait            address   symbol
       ...
           16344    312.30 ms      2.22 ms     19.11 us   ffff8cc702595640
           17686    310.08 ms      1.49 ms     17.53 us   ffff8cc7025952c0
               3     84.14 ms     45.79 ms     28.05 ms   ffff8cc78114c478   mmap_lock
            3557     76.80 ms     68.75 us     21.59 us   ffff8cc77ca3af58
               1     68.27 ms     68.27 ms     68.27 ms   ffff8cda745dfd70
               9     54.53 ms      7.96 ms      6.06 ms   ffff8cc7642a48b8   mmap_lock
           14629     44.01 ms     60.00 us      3.01 us   ffff8cc7625f9ca0
            3481     42.63 ms    140.71 us     12.24 us   ffffffff937906ac   vmap_area_lock
           16194     38.73 ms     42.15 us      2.39 us   ffff8cd397cbc560
              11     38.44 ms     10.39 ms      3.49 ms   ffff8ccd6d12fbb8   mmap_lock
               1      5.43 ms      5.43 ms      5.43 ms   ffff8cd70018f0d8
            1674      5.38 ms    422.93 us      3.21 us   ffffffff92e06080   tasklist_lock
             581      4.51 ms    130.68 us      7.75 us   ffff8cc9b1259058
               5      3.52 ms      1.27 ms    703.23 us   ffff8cc754510070
             112      3.47 ms     56.47 us     31.02 us   ffff8ccee38b3120
             381      3.31 ms     73.44 us      8.69 us   ffffffff93790690   purge_vmap_area_lock
             255      3.19 ms     36.35 us     12.49 us   ffff8d053ce30c80

   - Update default map size to 16384.

   - Allocate single letter option -M for --map-nr-entries, as it is
     proving being frequently used.

   - Fix struct rq lock access for older kernels with BPF's CO-RE
     (Compile once, run everywhere).

   - Fix problems found with MSAn.

  perf report/top:

   - Add inline information when using --call-graph=fp or lbr, as was
     already done to the --call-graph=dwarf callchain mode.

   - Improve the 'srcfile' sort key performance by really using an
     optimization introduced in 6.2 for the 'srcline' sort key that
     avoids calling addr2line for comparision with each sample.

  perf sched:

   - Make 'perf sched latency/map/replay' to use "sched:sched_waking"
     instead of "sched:sched_waking", consistent with 'perf record'
     since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
     exists").

  perf ftrace:

   - Make system wide the default target for latency subcommand, run the
     following command then generate some network traffic and press
     control+C:

       # perf ftrace latency -T __kfree_skb
     ^C
         DURATION     |      COUNT | GRAPH                                          |
          0 - 1    us |         27 | #############                                  |
          1 - 2    us |         22 | ###########                                    |
          2 - 4    us |          8 | ####                                           |
          4 - 8    us |          5 | ##                                             |
          8 - 16   us |         24 | ############                                   |
         16 - 32   us |          2 | #                                              |
         32 - 64   us |          1 |                                                |
         64 - 128  us |          0 |                                                |
        128 - 256  us |          0 |                                                |
        256 - 512  us |          0 |                                                |
        512 - 1024 us |          0 |                                                |
          1 - 2    ms |          0 |                                                |
          2 - 4    ms |          0 |                                                |
          4 - 8    ms |          0 |                                                |
          8 - 16   ms |          0 |                                                |
         16 - 32   ms |          0 |                                                |
         32 - 64   ms |          0 |                                                |
         64 - 128  ms |          0 |                                                |
        128 - 256  ms |          0 |                                                |
        256 - 512  ms |          0 |                                                |
        512 - 1024 ms |          0 |                                                |
          1 - ...   s |          0 |                                                |
       #

  perf top:

   - Add --branch-history (LBR: Last Branch Record) option, just like
     already available for 'perf record'.

   - Fix segfault in thread__comm_len() where thread->comm was being
     used outside thread->comm_lock.

  perf annotate:

   - Allow configuring objdump and addr2line in ~/.perfconfig., so that
     you can use alternative binaries, such as llvm's.

  perf kvm:

   - Add TUI mode for 'perf kvm stat report'.

  Reference counting:

   - Add reference count checking infrastructure to check for use after
     free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
     more to come.

     To build with it use -DREFCNT_CHECKING=1 in the make command line
     to build tools/perf. Documented at:

       https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

   - The above caught, for instance, fix, present in this series:

        - Fix maps use after put in 'perf test "Share thread maps"':

          'maps' is copied from leader, but the leader is put on line 79
          and then 'maps' is used to read the reference count below - so
          a use after put, with the put of maps happening within
          thread__put.

     Fixed by reversing the order of puts so that the leader is put
     last.

   - Also several fixes were made to places where reference counts were
     not being held.

   - Make this one of the tests in 'make -C tools/perf build-test' to
     regularly build test it and to make sure no direct access to the
     reference counted structs are made, doing that via accessors to
     check the validity of the struct pointer.

  ARM64:

   - Fix 'perf report' segfault when filtering coresight traces by
     sparse lists of CPUs.

   - Add support for 'simd' as a sort field for 'perf report', to show
     ARM's NEON SIMD's predicate flags: "partial" and "empty".

  arm64 vendor events:

   - Add N1 metrics.

  Intel vendor events:

   - Add graniterapids, grandridge and sierraforrest events.

   - Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
     broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
     jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
     silvermont, skylake, tigerlake and westmereep-dp

   - Refresh metrics for alderlake-n, broadwell, broadwellde,
     broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
     skylakex.

  perf stat:

   - Implement --topdown using JSON metrics.

   - Add TopdownL1 JSON metric as a default if present, but disable it
     for now for some Intel hybrid architectures, a series of patches
     addressing this is being reviewed and will be submitted for v6.5.

   - Use metrics for --smi-cost.

   - Update topdown documentation.

  Vendor events (JSON) infrastructure:

   - Add support for computing and printing metric threshold values. For
     instance, here is one found in thesapphirerapids json file:

       {
           "BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
           "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
           "MetricGroup": "smi",
           "MetricName": "smi_cycles",
           "MetricThreshold": "smi_cycles > 0.1",
           "ScaleUnit": "100%"
       },

   - Test parsing metric thresholds with the fake PMU in 'perf test
     pmu-events'.

   - Support for printing metric thresholds in 'perf list'.

   - Add --metric-no-threshold option to 'perf stat'.

   - Add rand (reverse and) and has_pmem (optane memory) support to
     metrics.

   - Sort list of input files to avoid depending on the order from
     readdir() helping in obtaining reproducible builds.

  S/390:

   - Add common metrics: - CPI (cycles per instruction), prbstate (ratio
     of instructions executed in problem state compared to total number
     of instructions), l1mp (Level one instruction and data cache misses
     per 100 instructions).

   - Add cache metrics for z13, z14, z15 and z16.

   - Add metric for TLB and cache.

  ARM:

   - Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
     (Memory Tagging Extension) and MOPS (Memory Operations) load/store.

  Intel PT hardware tracing:

   - Add event type names UINTR (User interrupt delivered) and UIRET
     (Exiting from user interrupt routine), documented in table 32-50
     "CFE Packet Type and Vector Fields Details" in the Intel Processor
     Trace chapter of The Intel SDM Volume 3 version 078.

   - Add support for new branch instructions ERETS and ERETU.

   - Fix CYC timestamps after standalone CBR

  ARM CoreSight hardware tracing:

   - Allow user to override timestamp and contextid settings.

   - Fix segfault in dso lookup.

   - Fix timeless decode mode detection.

   - Add separate decode paths for timeless and per-thread modes.

  auxtrace:

   - Fix address filter entire kernel size.

  Miscellaneous:

   - Fix use-after-free and unaligned bugs in the PLT handling routines.

   - Use zfree() to reduce chances of use after free.

   - Add missing 0x prefix for addresses printed in hexadecimal in 'perf
     probe'.

   - Suppress massive unsupported target platform errors in the unwind
     code.

   - Fix return incorrect build_id size in elf_read_build_id().

   - Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

   - Add missing new parameter in kfree_skb tracepoint to the python
     scripts using it.

   - Add 'perf bench syscall fork' benchmark.

   - Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
     'perf mem'.

   - Fix wrong size expectation for perf test 'Setup struct
     perf_event_attr' caused by the patch adding
     perf_event_attr::config3.

   - Fix some spelling mistakes"

* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
  Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
  Revert "perf build: Warn for BPF skeletons if endian mismatches"
  perf metrics: Fix SEGV with --for-each-cgroup
  perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
  perf stat: Separate bperf from bpf_profiler
  perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
  perf test record+probe_libc_inet_pton: Fix call chain match on s390
  perf tracepoint: Fix memory leak in is_valid_tracepoint()
  perf cs-etm: Add fix for coresight trace for any range of CPUs
  perf build: Fix unescaped # in perf build-test
  perf unwind: Suppress massive unsupported target platform errors
  perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
  perf script: Print raw ip instead of binary offset for callchain
  perf symbols: Fix return incorrect build_id size in elf_read_build_id()
  perf list: Modify the warning message about scandirat(3)
  perf list: Fix memory leaks in print_tracepoint_events()
  perf lock contention: Rework offset calculation with BPF CO-RE
  perf lock contention: Fix struct rq lock access
  perf stat: Disable TopdownL1 on hybrid
  perf stat: Avoid SEGV on counter->name
  ...
2023-05-07 11:32:18 -07:00
Huacai Chen
2fa5ebe3bc tools/perf: Add basic support for LoongArch
Add basic support for LoongArch, which is very similar to the MIPS
version.

Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-05-01 17:19:59 +08:00
Arnaldo Carvalho de Melo
7031edac9d perf dso: Add dso__filename_with_chroot() to reduce number of accesses to dso->nsinfo members
We'll need to reference count dso->nsinfo, so reduce the number of
direct accesses by having a shorter form of obtaining a filename with
a chroot (namespace one).

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/ZD26ZlqSbgSyH5lX@kernel.org
[ Used nsinfo__pid(dso->nsinfo), as it was already present ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-17 18:47:55 -03:00
Arnaldo Carvalho de Melo
c77ceb2eb0 perf annotate: Use zfree() to reduce chances of use after free
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.

Also include the missing linux/zalloc.h header directive.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-12 10:08:30 -03:00
Ian Rogers
78a1f7cd90 perf map: Add helper for ->map_ip() and ->unmap_ip()
Later changes will add reference count checking for struct map, add a
helper function to invoke the map_ip and unmap_ip function pointers. The
helper allows the reference count check to be in fewer places.

Committer notes:

Add missing conversions to:

  tools/perf/util/map.c
  tools/perf/util/cs-etm.c
  tools/perf/util/annotate.c
  tools/perf/arch/powerpc/util/sym-handling.c
  tools/perf/arch/s390/annotate/instructions.c

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06 22:10:17 -03:00
Ian Rogers
e5116f46d4 perf map: Add accessor for start and end
Later changes will add reference count checking for struct map, start
and end are frequently accessed variables. Add an accessor so that the
reference count check is only necessary in one place.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 16:54:11 -03:00
Ian Rogers
63df0e4bc3 perf map: Add accessor for dso
Later changes will add reference count checking for struct map, with
dso being the most frequently accessed variable. Add an accessor so
that the reference count check is only necessary in one place.

Additional changes:
 - add a dso variable to avoid repeated map__dso calls.
 - in builtin-mem.c dump_raw_samples, code only partially tested for
   dso == NULL. Make the possibility of NULL consistent.
 - in thread.c thread__memcpy fix use of spaces and use tabs.

Committer notes:

Did missing conversions on these files:

   tools/perf/arch/powerpc/util/skip-callchain-idx.c
   tools/perf/arch/powerpc/util/sym-handling.c
   tools/perf/ui/browsers/hists.c
   tools/perf/ui/gtk/annotate.c
   tools/perf/util/cs-etm.c
   tools/perf/util/thread.c
   tools/perf/util/unwind-libunwind-local.c
   tools/perf/util/unwind-libunwind.c

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 16:41:57 -03:00
Ian Rogers
57594454ce perf symbol: Add command line support for addr2line path
Allow addr2line to be set either on the command line or via the
perfconfig file. This doesn't currently work with llvm-addr2line as
the addr2line code emits two things:
1) the address to decode,
2) a bogus ',' value.
The expectation is the bogus value will generate:
??
??:0
that terminates the addr2line reading. However, the output from
llvm-addr2line is a single line with just the input ',' locking up the
addr2line reading that is expecting a second line.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Ian Rogers
0b02b47e71 perf annotate: Allow objdump to be set in perfconfig
Allow the setting of the objdump command in the perfconfig. Update man
page for this new option.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Ian Rogers
56d9117c50 perf annotate: Own objdump_path and disassembler_style strings
Make struct annotation_options own the strings objdump_path and
disassembler_style, freeing them on exit. Add missing strdup for
disassembler_style when read from a config file.

Committer notes:

Converted free(obj->member) to zfree(&obj->member) in
annotation_options__exit()

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Ian Rogers
217b7d41ea perf annotate: Add init/exit to annotation_options remove default
The annotation__default_options global variable was used to initialize
annotation_options.  Switch to the init/exit pattern as later changes
will give ownership over strings and this will be necessary to avoid
memory leaks.

Committer note:

Fix the GTK2=1 build, hist_entry__gtk_annotate() needs to receive a
'struct annotation_options' pointer.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 09:39:56 -03:00
Namhyung Kim
7d18a824b5 perf annotate: Toggle full address <-> offset display
Handle 'f' key to toggle the display offset and full address.  Obviously
it only works when users set to see disassembler output ('o' key).  It'd
be useful when users want to see the full virtual address in the TUI
annotate browser.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220923173142.805896-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:22 -03:00
Ian Rogers
e54dea69cd perf dso: Hold lock when accessing nsinfo
There may be threads racing to update dso->nsinfo:

  https://lore.kernel.org/linux-perf-users/CAP-5=fWZH20L4kv-BwVtGLwR=Em3AOOT+Q4QGivvQuYn5AsPRg@mail.gmail.com/

Holding the dso->lock avoids use-after-free, memory leaks and other such
bugs. Apply the fix in:

  https://lore.kernel.org/linux-perf-users/20211118193714.2293728-1-irogers@google.com/

of there being a missing nsinfo__put now that the accesses are data race
free. Fixes test "Lookup mmap thread" when compiled with address
sanitizer.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dario Petrillo <dario.pk1@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fangrui Song <maskray@google.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Weiguo Li <liwg06@foxmail.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Zechuan Chen <chenzechuan1@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/20220826164242.43412-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:20 -03:00
Ian Rogers
9b3726ef83 perf annotate: Update use of pthread mutex
Switch to the use of mutex wrappers that provide better error checking.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dario Petrillo <dario.pk1@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fangrui Song <maskray@google.com>
Cc: Hewenliang <hewenliang4@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Weiguo Li <liwg06@foxmail.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: William Cohen <wcohen@redhat.com>
Cc: Zechuan Chen <chenzechuan1@huawei.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Cc: yaowenbin <yaowenbin1@huawei.com>
Link: https://lore.kernel.org/r/20220826164242.43412-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:20 -03:00
Andres Freund
83aa012048 tools perf: Fix compilation error with new binutils
binutils changed the signature of init_disassemble_info(), which now causes
compilation failures for tools/perf/util/annotate.c, e.g. on debian
unstable.

Relevant binutils commit:

  https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=60a3da00bd5407f07

Wire up the feature test and switch to init_disassemble_info_compat(),
which were introduced in prior commits, fixing the compilation failure.

I verified that perf can still disassemble bpf programs by using bpftrace
under load, recording a perf trace, and then annotating the bpf "function"
with and without the changes. With old binutils there's no change in output
before/after this patch. When comparing the output from old binutils (2.35)
to new bintuils with the patch (upstream snapshot) there are a few output
differences, but they are unrelated to this patch. An example hunk is:

       1.15 :   55:mov    %rbp,%rdx
       0.00 :   58:add    $0xfffffffffffffff8,%rdx
       0.00 :   5c:xor    %ecx,%ecx
  -    1.03 :   5e:callq  0xffffffffe12aca3c
  +    1.03 :   5e:call   0xffffffffe12aca3c
       0.00 :   63:xor    %eax,%eax
  -    2.18 :   65:leaveq
  -    2.82 :   66:retq
  +    2.18 :   65:leave
  +    2.82 :   66:ret

Signed-off-by: Andres Freund <andres@anarazel.de>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ben Hutchings <benh@debian.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220622181918.ykrs5rsnmx3og4sv@alap3.anarazel.de
Link: https://lore.kernel.org/r/20220801013834.156015-5-andres@anarazel.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-01 15:30:22 -03:00
Ian Rogers
940a445a90 perf annotate: Drop objdump stderr to avoid getting stuck waiting for stdout output
If objdump writes to stderr it can block waiting for it to be read. As
perf doesn't read stderr then progress stops with perf waiting for
stdout output.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lexi Shao <shaolexi@huawei.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220407230503.1265036-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-09 14:21:00 -03:00
Namhyung Kim
40b0c5fd24 perf annotate: Try chroot filename for objdump
Likewise, it should use a proper name in case the task runs under
chroot.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220202070828.143303-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-10 15:34:13 -03:00
Namhyung Kim
a663520fcc perf annotate: Set error stream of objdump process for TUI
The stderr should be set to a pipe when using TUI.  Otherwise it'd
print to stdout and break TUI windows with an error message.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220202070828.143303-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06 09:03:06 -03:00
Ian Rogers
4f74f18789 perf symbols: Factor out annotation init/exit
The exit function fixes a memory leak with the src field as detected by
leak sanitizer. An example of which is:

Indirect leak of 25133184 byte(s) in 207 object(s) allocated from:
    #0 0x7f199ecfe987 in __interceptor_calloc libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55defe638224 in annotated_source__alloc_histograms util/annotate.c:803
    #2 0x55defe6397e4 in symbol__hists util/annotate.c:952
    #3 0x55defe639908 in symbol__inc_addr_samples util/annotate.c:968
    #4 0x55defe63aa29 in hist_entry__inc_addr_samples util/annotate.c:1119
    #5 0x55defe499a79 in hist_iter__report_callback tools/perf/builtin-report.c:182
    #6 0x55defe7a859d in hist_entry_iter__add util/hist.c:1236
    #7 0x55defe49aa63 in process_sample_event tools/perf/builtin-report.c:315
    #8 0x55defe731bc8 in evlist__deliver_sample util/session.c:1473
    #9 0x55defe731e38 in machines__deliver_event util/session.c:1510
    #10 0x55defe732a23 in perf_session__deliver_event util/session.c:1590
    #11 0x55defe72951e in ordered_events__deliver_event util/session.c:183
    #12 0x55defe740082 in do_flush util/ordered-events.c:244
    #13 0x55defe7407cb in __ordered_events__flush util/ordered-events.c:323
    #14 0x55defe740a61 in ordered_events__flush util/ordered-events.c:341
    #15 0x55defe73837f in __perf_session__process_events util/session.c:2390
    #16 0x55defe7385ff in perf_session__process_events util/session.c:2420
    ...

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
4270456704 perf symbols: Bit pack to save a byte
Use a bit field alongside the earlier bit fields.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Dave Marchevsky
6ac22d036f perf bpf: Pull in bpf_program__get_prog_info_linear()
To prepare for impending deprecation of libbpf's bpf_program__get_prog_info_linear(),
pull in the function and associated helpers into the perf codebase and migrate
existing uses to the perf copy.

Since libbpf's deprecated definitions will still be visible to perf, it is necessary
to rename perf's definitions.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211011082031.4148337-4-davemarchevsky@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-01 18:16:40 -03:00
William Cohen
0ba37e05c2 perf annotate: Add riscv64 support
This patch adds basic arch initialization and instruction associate
support for the riscv64 CPU architecture.

Example output:

  $ perf annotate --stdio2
  Samples: 122K of event 'task-clock:u', 4000 Hz, Event count (approx.): 30637250000, [percent: local period]
  strcmp() /usr/lib64/libc-2.32.so
  Percent

	      Disassembly of section .text:

	      0000000000069a30 <strcmp>:
	      __GI_strcmp():
	      const unsigned char *s2 = (const unsigned char *) p2;
	      unsigned char c1, c2;

	      do
	      {
	      c1 = (unsigned char) *s1++;
   37.30        lbu  a5,0(a0)
	      c2 = (unsigned char) *s2++;
    1.23        addi a1,a1,1
	      c1 = (unsigned char) *s1++;
   18.68        addi a0,a0,1
	      c2 = (unsigned char) *s2++;
    1.37        lbu  a4,-1(a1)
	      if (c1 == '\0')
   18.71      ↓ beqz a5,18
	       return c1 - c2;
	       }

Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210927005115.610264-1-wcohen@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 09:33:44 -03:00
Ravi Bangoria
3149733584 perf annotate: Add fusion logic for AMD microarchs
AMD family 15h and above microarchs fuse a subset of cmp/test/ALU
instructions with branch instructions[1][2]. Add perf annotate
fused instruction support for these microarchs.

Before:
         │       testb  $0x80,0x51(%rax)
         │    ┌──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

After:
         │    ┌──testb  $0x80,0x51(%rax)
         │    ├──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

[1] https://bugzilla.kernel.org/attachment.cgi?id=298553
[2] https://bugzilla.kernel.org/attachment.cgi?id=298555

Committer testing:

On a:

  $ grep -m1 "model name" /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  $

  Samples: 44K of event 'cycles', 4000 Hz, Event count (approx.): 7533249650
  _int_malloc  /usr/lib64/libc-2.33.so [Percent: local period]
  Percent│    ┌──test   %eax,%eax
         │    ├──jne    884
         │    │↓ jmpq   943
         │    │  nop
         │878:│  add    $0x10,%rdx
    0.64 │    │  add    %eax,%eax
    0.57 │    │↓ je     cc9
    0.77 │884:└─→test   %esi,%eax
         │     ↑ je     878
         │       mov    0x18(%rdx),%r15

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https //lore.kernel.org/r/20210911043854.8373-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-15 17:54:52 -03:00
Ian Rogers
298105b78b perf bpf: Fix memory leaks relating to BTF.
BTF needs to be freed with btf__free().

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210826184833.408563-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 15:12:00 -03:00
James Clark
243c3a3eb4 perf annotate: Add disassembly warnings for annotate --stdio
Currently 'perf annotate --stdio' (and --stdio2) will exit without
printing anything if there are disassembly errors. Apply the same error
handler that's used for TUI and GTK modes. This makes comparing
disassembly across the different modes more consistent.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210729155805.2830-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-03 17:04:06 -03:00
Li Huafei
c4db54be9b perf annotate: Add error log in symbol__annotate()
When users use 'perf annotate' on unsupported machines, error logs
should be printed for user feedback.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Zhang Jinhao <zhangjinhao2@huawei.com>
Link: http://lore.kernel.org/lkml/20210726123854.13463-2-lihuafei1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-02 09:56:18 -03:00
Jiri Olsa
38fe0e0156 libperf: Move 'idx' from tools/perf to perf_evsel::idx
Move evsel::idx to perf_evsel::idx, so we can move the group interface
to libperf.

Committer notes:

Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that
appeared in my tree in my local tree.

Also fixed up these:

$ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx'
tools/perf/ui/gtk/annotate.c:                      evsel->idx + i);
tools/perf/ui/gtk/annotate.c:                   evsel->idx);
$

That running 'make -C tools/perf build-test' caught.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:28 -03:00
Martin Liška
f89a82a82b perf annotate: Add line number like in TUI and source location at EOL
The patch changes the output format in 2 ways:
- line number is displayed for all source lines (matching TUI mode)
- source locations for the hottest lines are printed
   at the line end in order to preserve layout

Before:

     0.00 :   405ef1: inc    %r15
          :            tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          :            tmpsd * (TC +
  eff.c:1811    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8>
          :            TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          :            dumbo =
  eff.c:1809    1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8>
          :            sumi -= sj * tmpsd * dij2i * dumbo;
  eff.c:1813    2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          :            for (k = 0; k < lpears[i] + upears[i]; k++) {
  eff.c:1761    0.90 :   405f29: cmp    %r15d,%r12d

After:

     0.00 :   405ef1: inc    %r15
          : 1812   tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          : 1811   tmpsd * (TC +
     0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
          : 1810   TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          : 1809   dumbo =
     1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8> // eff.c:1809
          : 1813   sumi -= sj * tmpsd * dij2i * dumbo;
     2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0 // eff.c:1813
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          : 1761   for (k = 0; k < lpears[i] + upears[i]; k++) {

Where e.g. '// eff.c:1811' shares the same color as the percentantage
at the line beginning.

Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/a0d53f31-f633-5013-c386-a4452391b081@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Martin Liska
2777b81b37 perf annotate: Show full source location with 'l' hotkey
Right now, when Line numbers are displayed, one can't easily find a
source file that the line corresponds to.

When a source line is selected and 'l' is pressed, full source file
location is displayed in perf UI footer line. The hotkey works only for
source code lines.

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/25a6384f-d862-5dda-4fec-8f0555599c75@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Martin Liska
44e176501c perf config: Add annotate.demangle{,_kernel}
Committer notes:

This allows setting this in from the command line:

  $ perf config annotate.demangle
  $ perf config annotate.demangle=yes
  $ perf config annotate.demangle
  annotate.demangle=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  $
  $
  $ perf config annotate.demangle_kernel
  $ perf config annotate.demangle_kernel=yes
  $ perf config annotate.demangle_kernel
  annotate.demangle_kernel=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  	demangle_kernel = yes
  $

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/c96aabe7-791f-9503-295f-3147a9d19b60@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Martin Liška
1f0e6edcd9 perf annotate: Fix jump parsing for C++ code.
Considering the following testcase:

  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }

'perf annotate' displays:

  86.52 │40055e: → ja   40056c <foo(int, int)+0x26>
  13.37 │400560:   mov  -0x18(%rbp),%eax
        │400563:   add  %eax,-0x14(%rbp)
        │400566:   addl $0x1,-0x4(%rbp)
   0.11 │40056a: → jmp  400557 <foo(int, int)+0x11>
        │40056c:   mov  -0x14(%rbp),%eax
        │40056f:   pop  %rbp

and the 'ja 40056c' does not link to the location in the function.  It's
caused by fact that comma is wrongly parsed, it's part of function
signature.

With my patch I see:

  86.52 │   ┌──ja   26
  13.37 │   │  mov  -0x18(%rbp),%eax
        │   │  add  %eax,-0x14(%rbp)
        │   │  addl $0x1,-0x4(%rbp)
   0.11 │   │↑ jmp  11
        │26:└─→mov  -0x14(%rbp),%eax

and 'o' output prints:

  86.52 │4005┌── ↓ ja   40056c <foo(int, int)+0x26>
  13.37 │4005│0:   mov  -0x18(%rbp),%eax
        │4005│3:   add  %eax,-0x14(%rbp)
        │4005│6:   addl $0x1,-0x4(%rbp)
   0.11 │4005│a: ↑ jmp  400557 <foo(int, int)+0x11>
        │4005└─→   mov  -0x14(%rbp),%eax

On the contrary, compiling the very same file with gcc -x c, the parsing
is fine because function arguments are not displayed:

  jmp  400543 <foo+0x1d>

Committer testing:

Before:

  $ cat cpp_args_annotate.c
  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }
  $ gcc --version |& head -1
  gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
  $ gcc -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo>:
              foo():
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
              ↓ jmp  1d
              a += b;
   13.45  13:   mov  -0x18(%rbp),%eax
                add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.09  1d:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.46      ↑ jbe  13
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

I.e. works for C, now lets switch to C++:

  $ g++ -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
                cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      → ja   40112c <foo(int, int)+0x26>
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      → jmp  401117 <foo(int, int)+0x11>
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

Reproduced.

Now with this patch:

Reusing the C++ built binary, as we can see here:

  $ readelf -wi cpp_args_annotate | grep producer
    <c>   DW_AT_producer    : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g
  $

And furthermore:

  $ file cpp_args_annotate
  cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped
  $ perf buildid-list -i cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9
  $ perf buildid-list | grep cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate
  $

It now works:

  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
          11:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      ↓ ja   26
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      ↑ jmp  11
              return a;
          26:   mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 18:23:53 -03:00
Arnaldo Carvalho de Melo
20e88c6076 perf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORT
No need to include it otherwise.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Dengcheng Zhu
a701d28e2d perf annotate mips: Add perf arch instructions annotate handlers
Support the MIPS architecture using the ins_ops association method. With
this patch, perf-annotate can work well on MIPS.

Testing it with a perf.data file collected on a mips machine:

$./perf annotate -i perf.data

         :           Disassembly of section .text:
         :
         :           00000000000be6a0 <get_next_seq>:
         :           get_next_seq():
    0.00 :   be6a0:       lw      v0,0(a0)
    0.00 :   be6a4:       daddiu  sp,sp,-128
    0.00 :   be6a8:       ld      a7,72(a0)
    0.00 :   be6ac:       gssq    s5,s4,80(sp)
    0.00 :   be6b0:       gssq    s1,s0,48(sp)
    0.00 :   be6b4:       gssq    s8,gp,112(sp)
    0.00 :   be6b8:       gssq    s7,s6,96(sp)
    0.00 :   be6bc:       gssq    s3,s2,64(sp)
    0.00 :   be6c0:       sd      a3,0(sp)
    0.00 :   be6c4:       move    s0,a0
    0.00 :   be6c8:       sd      v0,32(sp)
    0.00 :   be6cc:       sd      a5,8(sp)
    0.00 :   be6d0:       sd      zero,8(a0)
    0.00 :   be6d4:       sd      a6,16(sp)
    0.00 :   be6d8:       ld      s2,48(a0)
    8.53 :   be6dc:       ld      s1,40(a0)
    9.42 :   be6e0:       ld      v1,32(a0)
    0.00 :   be6e4:       nop
    0.00 :   be6e8:       ld      s4,24(a0)
    0.00 :   be6ec:       ld      s5,16(a0)
    0.00 :   be6f0:       sd      a7,40(sp)
   10.11 :   be6f4:       ld      s6,64(a0)

...

The original patch link:
https://lore.kernel.org/patchwork/patch/1180480/

Signed-off-by: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: linux-mips@vger.kernel.org
[ fanpeng@loongson.cn: Add missing "bgtzl", "bltzl", "bgezl", "blezl", "beql" and "bnel" for pre-R6processors ]
Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Jiri Olsa
bf5411695a perf tools: Pass build_id object to build_id__sprintf()
Passing build_id object to build_id__sprintf function, so it can operate
with the proper size of build id.

This will create proper md5 build id readable names,
like following:

  a50e350e97c43b4708d09bcd85ebfff7

instead of:

  a50e350e97c43b4708d09bcd85ebfff700000000

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:46:22 -03:00
Jiri Olsa
0aba7f036a perf tools: Use build_id object in dso
Replace build_id byte array with struct build_id object and all the code
that references it.

The objective is to carry size together with build id array, so it's
better to keep both together.

This is preparatory change for following patches, and there's no
functional change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:44:47 -03:00
Arnaldo Carvalho de Melo
bbe544682e perf annotate: Allow configuring the 'disassembler_style' knob via 'perf config'
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > default
  # perf config annotate.disassembler_style=intel
  # perf config annotate.disassembler_style
  annotate.disassembler_style=intel
  # perf annotate --stdio2 acpi_processor_ffh_cstate_enter > intel
  # diff -u default intel
  --- default	2020-09-04 13:09:26.019205732 -0300
  +++ intel	2020-09-04 13:09:52.823795081 -0300
  @@ -1,42 +1,42 @@
   Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 990065316, [percent: local period]
   acpi_processor_ffh_cstate_enter() /lib/modules/5.9.0-rc3/build/vmlinux
  -Percent     → callq   __fentry__
  -              mov     cpu_number,%edx
  -              mov     %edx,%edx
  -              mov     cpu_cstate_entry,%rax
  -              add     -0x7dbe9700(,%rdx,8),%rax
  -              movzbl  0x9(%rdi),%edx
  -              mov     0x4(%rax,%rdx,8),%edi
  -              mov     (%rax,%rdx,8),%esi
  -            → jmpq    137ccc6
  -        2d: → jmpq    137ccd8
  +Percent     → call    __fentry__
  +              mov     edx,DWORD PTR gs:[rip+0x7e541d74]
  +              mov     edx,edx
  +              mov     rax,QWORD PTR [rip+0x152b8fb]
  +              add     rax,QWORD PTR [rdx*8-0x7dbe9700]
  +              movzx   edx,BYTE PTR [rdi+0x9]
  +              mov     edi,DWORD PTR [rax+rdx*8+0x4]
  +              mov     esi,DWORD PTR [rax+rdx*8]
  +            → jmp     137ccc6
  +        2d: → jmp     137ccd8
                 mfence
  -              mov     %gs:0x17bc0,%rax
  -              clflush (%rax)
  +              mov     rax,QWORD PTR gs:0x17bc0
  +              clflush BYTE PTR [rax]
                 mfence
  -              xor     %edx,%edx
  -              mov     %rdx,%rcx
  -              mov     %gs:0x17bc0,%rax
  -  0.00        monitor %rax,%ecx,%edx
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +              xor     edx,edx
  +              mov     rcx,rdx
  +              mov     rax,QWORD PTR gs:0x17bc0
  +  0.00        monitor
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↓ jne     71
  -            ↓ jmpq    68
  -              verw    0x538b08(%rip)        # ffffffff82008150 <ds.0>
  -        68:   mov     %rsi,%rax
  -              mov     %rdi,%rcx
  -100.00        mwait   %eax,%ecx
  -        71:   mov     %gs:0x17bc0,%rax
  -              lock    andb    $0xdf,0x2(%rax)
  -              lock    addl    $0x0,-0x4(%rsp)
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +            ↓ jmp     68
  +              verw    WORD PTR [rip+0x538b08]        # ffffffff82008150 <ds.0>
  +        68:   mov     rax,rsi
  +              mov     rcx,rdi
  +100.00        mwait
  +        71:   mov     rax,QWORD PTR gs:0x17bc0
  +              lock    and     BYTE PTR [rax+0x2],0xdf
  +              lock    add     DWORD PTR [rsp-0x4],0x0
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↓ je      97
  -              andl    $0x7fffffff,__preempt_count
  -        97: ← retq
  -              mov     %gs:0x17bc0,%rax
  -              lock    orb     $0x20,0x2(%rax)
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +              and     DWORD PTR gs:[rip+0x7e548509],0x7fffffff
  +        97:   ret
  +              mov     rax,QWORD PTR gs:0x17bc0
  +              lock    or      BYTE PTR [rax+0x2],0x20
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↑ jne     71
  -            ↑ jmpq    2d
  +            ↑ jmp     2d
  #

Requested-by: Matt P. Dziubinski <matdzb@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04 16:07:23 -03:00
Numfor Mbiziwo-Tiapo
b39730a663 perf annotate: Fix non-null terminated buffer returned by readlink()
Our local MSAN (Memory Sanitizer) build of perf throws a warning that
comes from the "dso__disassemble_filename" function in
"tools/perf/util/annotate.c" when running perf record.

The warning stems from the call to readlink, in which "build_id_path"
was being read into "linkname". Since readlink does not null terminate,
an uninitialized memory access would later occur when "linkname" is
passed into the strstr function. This is simply fixed by
null-terminating "linkname" after the call to readlink.

To reproduce this warning, build perf by running:

  $ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins"

(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)

Then running:

  tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio

Please see the cover letter for why false positive warnings may be
generated.

Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-07-09 12:36:50 -03:00
Tiezhu Yang
3e9b26dc22 perf tools: Remove some duplicated includes
There exists some duplicated includes in tools/perf, remove them.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xuefeng li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02 11:09:41 -03:00
Arnaldo Carvalho de Melo
6e6d1d654e perf evsel: Rename perf_evsel__env() to evsel__env()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
c754c382c9 perf evsel: Rename perf_evsel__is_*() to evsel__is*()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
347c751a64 perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
8ab2e96d8f perf evsel: Rename *perf_evsel__*name() to *evsel__*name()
As they are 'struct evsel' methods or related routines, not part of
tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Jiri Olsa
3c29d4483e perf annotate: Add basic support for bpf_image
Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF
images that carry trampoline or dispatcher.

Upcoming patches will add support to read the image data, store it
within the BPF feature in perf.data and display it for annotation
purposes.

Currently we only display following message:

  # ./perf annotate bpf_trampoline_24456 --stdio
   Percent |      Source code & Disassembly of . for cycles (504  ...
  --------------------------------------------------------------- ...
           :       to be implemented

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16 12:19:06 -03:00
Ravi Bangoria
dabce16bd2 perf annotate: Get rid of annotation->nr_jumps
The 'nr_jumps' field in 'struct annotation' is not used since it's
inception in commit 2402e4a936 ("perf annotate browser: Show 'jumpy'
functions").  Get rid of it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-04 10:34:10 -03:00
Ravi Bangoria
e0560ba6d9 perf annotate: Fix segfault with source toggle
While rendering annotate browser from perf report tui, we keep track
of total number of lines(asm + source) in annotation->nr_entries and
total number of asm lines in annotation->nr_asm_entries. But we don't
reset them before starting. Thus if user annotates same function
multiple times, we restart incrementing these fields with old values.

This causes a segfault when user tries to toggle source code after
annotating same function multiple times. Fix it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-5-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:47:23 -03:00
Ravi Bangoria
d3c03147bf perf annotate: Align struct annotate_args
Align fields of struct annotate_args.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:47:23 -03:00
Ravi Bangoria
2316f861ae perf annotate: Simplify disasm_line allocation and freeing code
We are allocating disasm_line object in annotation_line__new() instead
of disasm_line__new(). Similarly annotation_line__delete() is actually
freeing disasm_line object as well. This complexity is because of
privsize.  But we don't need privsize anymore so get rid of privsize and
simplify disasm_line allocation and freeing code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-3-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:07:13 -03:00
Ravi Bangoria
e0ad4d6854 perf annotate: Remove privsize from symbol__annotate() args
privsize is passed as 0 from all the symbol__annotate() callers.
Remove it from argument list.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-2-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:06:14 -03:00
Ravi Bangoria
7384083ba6 perf annotate: Make perf config effective
perf default config set by user in [annotate] section is totally ignored
by annotate code. Fix it.

Before:

  $ ./perf config
  annotate.hide_src_code=true
  annotate.show_nr_jumps=true
  annotate.show_nr_samples=true

  $ ./perf annotate shash
         │    unsigned h = 0;
         │      movl   $0x0,-0xc(%rbp)
         │    while (*s)
         │    ↓ jmp    44
         │    h = 65599 * h + *s++;
   11.33 │24:   mov    -0xc(%rbp),%eax
   43.50 │      imul   $0x1003f,%eax,%ecx
         │      mov    -0x18(%rbp),%rax

After:

         │        movl   $0x0,-0xc(%rbp)
         │      ↓ jmp    44
       1 │1 24:   mov    -0xc(%rbp),%eax
       4 │        imul   $0x1003f,%eax,%ecx
         │        mov    -0x18(%rbp),%rax

Note that we have removed show_nr_samples and show_total_period from
annotation_options because they are not used. Instead of them we use
symbol_conf.show_nr_samples and symbol_conf.show_total_period.

Committer testing:

Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:

  # perf config
  #
  # perf config annotate.hide_src_code=true
  # perf config
  annotate.hide_src_code=true
  #
  # perf config annotate.show_nr_jumps=true
  # perf config annotate.show_nr_samples=true
  # perf config
  annotate.hide_src_code=true
  annotate.show_nr_jumps=true
  annotate.show_nr_samples=true
  #
  #

Before:

  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Percent
              00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@Base>:
                endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
  100.00  10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
          1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
          20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #

After:

  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Samples       endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
     1  1 10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
        1 1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
        1 20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #
  # perf config annotate.show_nr_jumps
  annotate.show_nr_jumps=true
  # perf config annotate.show_nr_jumps=false
  # perf config annotate.show_nr_jumps
  annotate.show_nr_jumps=false
  #
  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Samples       endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
       1  10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
          1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
          20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:59 -03:00
Ravi Bangoria
46ccb44269 perf annotate: Fix --show-nr-samples for tui/stdio2
perf annotate --show-nr-samples does not really show number of samples.

The reason is we have two separate variables for the same purpose.

One is in symbol_conf.show_nr_samples and another is
annotation_options.show_nr_samples.

We save command line option in symbol_conf.show_nr_samples but uses
annotation_option.show_nr_samples while rendering tui/stdio2 browser.

Though, we copy symbol_conf.show_nr_samples to
annotation__default_options.show_nr_samples but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().

Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_nr_samples is used by perf report/top as well. So
let's kill annotation_options.show_nr_samples.

On a side note, I've kept annotation_options.show_nr_samples definition
because it's still used by perf-config code. Follow up patch to fix
perf-config for annotate will remove annotation_options.show_nr_samples.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:48 -03:00
Ravi Bangoria
68aac855b6 perf annotate: Fix --show-total-period for tui/stdio2
perf annotate --show-total-period does not really show total period.

The reason is we have two separate variables for the same purpose.

One is in symbol_conf.show_total_period and another is
annotation_options.show_total_period.

We save command line option in symbol_conf.show_total_period but uses
annotation_option.show_total_period while rendering tui/stdio2 browser.

Though, we copy symbol_conf.show_total_period to
annotation__default_options.show_total_period but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().

Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_total_period is used by perf report/top as well.
So let's kill annotation_options.show_total_period.

On a side note, I've kept annotation_options.show_total_period
definition because it's still used by perf-config code. Follow up patch
to fix perf-config for annotate will remove
annotation_options.show_total_period.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-3-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:40 -03:00
Andi Kleen
3b0b16bf8c perf tools: Support --prefix/--prefix-strip
The objdump utility has useful --prefix / --prefix-strip options to
allow changing source code file names hardcoded into executables' debug
info. Add options to 'perf report', 'perf top' and 'perf annotate',
which are then passed to objdump.

  $ mkdir foo
  $ echo 'main() { for (;;); }' > foo/foo.c
  $ gcc -g foo/foo.c
  foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
      1 | main() { for (;;); }
        | ^~~~
  $ perf record ./a.out
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ]
  $ mv foo bar
  $ perf annotate
  <does not show source code>
  $ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5
  <does show source code>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-01-14 12:02:19 -03:00
Arnaldo Carvalho de Melo
c54d241b35 perf maps: Rename map_groups.h to maps.h
One more step in the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9ibtn3vua76f934t7woyf26w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
f2eaea09d6 perf map_symbol: Rename ms->mg to ms->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
79b6bb73f8 perf maps: Merge 'struct maps' with 'struct map_groups'
And pick the shortest name: 'struct maps'.

The split existed because we used to have two groups of maps, one for
functions and one for variables, but that only complicated things,
sometimes we needed to figure out what was at some address and then had
to first try it on the functions group and if that failed, fall back to
the variables one.

That split is long gone, so for quite a while we had only one struct
maps per struct map_groups, simplify things by combining those structs.

First patch is the minimum needed to merge both, follow up patches will
rename 'thread->mg' to 'thread->maps', etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
94e44b9ca5 perf annotate: Stop using map->groups, use map_symbol->mg instead
These were the last uses of map->groups, next cset will nuke it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-n3g0foos7l7uxq9nar0zo0vj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
d46a4cdf49 pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol'
So that we pass that substructure around and with it consolidate lots of
functions that receive a (map, symbol) pair and now can receive just a
'struct map_symbol' pointer.

This further paves the way to add 'struct map_groups' to 'struct
map_symbol' so that we can have all we need for annotation so that we
can ditch 'struct map'->groups, i.e. have the map_groups pointer in a
more central place, avoiding the pointer in the 'struct map' that have
tons of instances.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
2975489458 perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers
We are already passing things like:

  symbol__annotate(ms->sym, ms->map, ...)

So shorten the signature of such functions to receive the 'map_symbol'
pointer.

This also paves the way to having the 'struct map_groups' pointer in the
'struct map_symbol' so that we can get rid of 'struct map'->groups.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00