2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

464 Commits

Author SHA1 Message Date
Ian Rogers
4f74f18789 perf symbols: Factor out annotation init/exit
The exit function fixes a memory leak with the src field as detected by
leak sanitizer. An example of which is:

Indirect leak of 25133184 byte(s) in 207 object(s) allocated from:
    #0 0x7f199ecfe987 in __interceptor_calloc libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55defe638224 in annotated_source__alloc_histograms util/annotate.c:803
    #2 0x55defe6397e4 in symbol__hists util/annotate.c:952
    #3 0x55defe639908 in symbol__inc_addr_samples util/annotate.c:968
    #4 0x55defe63aa29 in hist_entry__inc_addr_samples util/annotate.c:1119
    #5 0x55defe499a79 in hist_iter__report_callback tools/perf/builtin-report.c:182
    #6 0x55defe7a859d in hist_entry_iter__add util/hist.c:1236
    #7 0x55defe49aa63 in process_sample_event tools/perf/builtin-report.c:315
    #8 0x55defe731bc8 in evlist__deliver_sample util/session.c:1473
    #9 0x55defe731e38 in machines__deliver_event util/session.c:1510
    #10 0x55defe732a23 in perf_session__deliver_event util/session.c:1590
    #11 0x55defe72951e in ordered_events__deliver_event util/session.c:183
    #12 0x55defe740082 in do_flush util/ordered-events.c:244
    #13 0x55defe7407cb in __ordered_events__flush util/ordered-events.c:323
    #14 0x55defe740a61 in ordered_events__flush util/ordered-events.c:341
    #15 0x55defe73837f in __perf_session__process_events util/session.c:2390
    #16 0x55defe7385ff in perf_session__process_events util/session.c:2420
    ...

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Ian Rogers
4270456704 perf symbols: Bit pack to save a byte
Use a bit field alongside the earlier bit fields.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211112035124.94327-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-13 18:11:51 -03:00
Dave Marchevsky
6ac22d036f perf bpf: Pull in bpf_program__get_prog_info_linear()
To prepare for impending deprecation of libbpf's bpf_program__get_prog_info_linear(),
pull in the function and associated helpers into the perf codebase and migrate
existing uses to the perf copy.

Since libbpf's deprecated definitions will still be visible to perf, it is necessary
to rename perf's definitions.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211011082031.4148337-4-davemarchevsky@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-01 18:16:40 -03:00
William Cohen
0ba37e05c2 perf annotate: Add riscv64 support
This patch adds basic arch initialization and instruction associate
support for the riscv64 CPU architecture.

Example output:

  $ perf annotate --stdio2
  Samples: 122K of event 'task-clock:u', 4000 Hz, Event count (approx.): 30637250000, [percent: local period]
  strcmp() /usr/lib64/libc-2.32.so
  Percent

	      Disassembly of section .text:

	      0000000000069a30 <strcmp>:
	      __GI_strcmp():
	      const unsigned char *s2 = (const unsigned char *) p2;
	      unsigned char c1, c2;

	      do
	      {
	      c1 = (unsigned char) *s1++;
   37.30        lbu  a5,0(a0)
	      c2 = (unsigned char) *s2++;
    1.23        addi a1,a1,1
	      c1 = (unsigned char) *s1++;
   18.68        addi a0,a0,1
	      c2 = (unsigned char) *s2++;
    1.37        lbu  a4,-1(a1)
	      if (c1 == '\0')
   18.71      ↓ beqz a5,18
	       return c1 - c2;
	       }

Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210927005115.610264-1-wcohen@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-27 09:33:44 -03:00
Ravi Bangoria
3149733584 perf annotate: Add fusion logic for AMD microarchs
AMD family 15h and above microarchs fuse a subset of cmp/test/ALU
instructions with branch instructions[1][2]. Add perf annotate
fused instruction support for these microarchs.

Before:
         │       testb  $0x80,0x51(%rax)
         │    ┌──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

After:
         │    ┌──testb  $0x80,0x51(%rax)
         │    ├──jne    5b3
    0.78 │    │  mov    %r13,%rdi
         │    │→ callq  mark_page_accessed
    1.08 │5b3:└─→mov    0x8(%r13),%rax

[1] https://bugzilla.kernel.org/attachment.cgi?id=298553
[2] https://bugzilla.kernel.org/attachment.cgi?id=298555

Committer testing:

On a:

  $ grep -m1 "model name" /proc/cpuinfo
  model name	: AMD Ryzen 9 3900X 12-Core Processor
  $

  Samples: 44K of event 'cycles', 4000 Hz, Event count (approx.): 7533249650
  _int_malloc  /usr/lib64/libc-2.33.so [Percent: local period]
  Percent│    ┌──test   %eax,%eax
         │    ├──jne    884
         │    │↓ jmpq   943
         │    │  nop
         │878:│  add    $0x10,%rdx
    0.64 │    │  add    %eax,%eax
    0.57 │    │↓ je     cc9
    0.77 │884:└─→test   %esi,%eax
         │     ↑ je     878
         │       mov    0x18(%rdx),%r15

Reported-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https //lore.kernel.org/r/20210911043854.8373-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-15 17:54:52 -03:00
Ian Rogers
298105b78b perf bpf: Fix memory leaks relating to BTF.
BTF needs to be freed with btf__free().

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210826184833.408563-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-31 15:12:00 -03:00
James Clark
243c3a3eb4 perf annotate: Add disassembly warnings for annotate --stdio
Currently 'perf annotate --stdio' (and --stdio2) will exit without
printing anything if there are disassembly errors. Apply the same error
handler that's used for TUI and GTK modes. This makes comparing
disassembly across the different modes more consistent.

Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210729155805.2830-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-03 17:04:06 -03:00
Li Huafei
c4db54be9b perf annotate: Add error log in symbol__annotate()
When users use 'perf annotate' on unsupported machines, error logs
should be printed for user feedback.

Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Zhang Jinhao <zhangjinhao2@huawei.com>
Link: http://lore.kernel.org/lkml/20210726123854.13463-2-lihuafei1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-02 09:56:18 -03:00
Jiri Olsa
38fe0e0156 libperf: Move 'idx' from tools/perf to perf_evsel::idx
Move evsel::idx to perf_evsel::idx, so we can move the group interface
to libperf.

Committer notes:

Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that
appeared in my tree in my local tree.

Also fixed up these:

$ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx'
tools/perf/ui/gtk/annotate.c:                      evsel->idx + i);
tools/perf/ui/gtk/annotate.c:                   evsel->idx);
$

That running 'make -C tools/perf build-test' caught.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-07-09 14:04:28 -03:00
Martin Liška
f89a82a82b perf annotate: Add line number like in TUI and source location at EOL
The patch changes the output format in 2 ways:
- line number is displayed for all source lines (matching TUI mode)
- source locations for the hottest lines are printed
   at the line end in order to preserve layout

Before:

     0.00 :   405ef1: inc    %r15
          :            tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          :            tmpsd * (TC +
  eff.c:1811    0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8>
          :            TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          :            dumbo =
  eff.c:1809    1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8>
          :            sumi -= sj * tmpsd * dij2i * dumbo;
  eff.c:1813    2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          :            for (k = 0; k < lpears[i] + upears[i]; k++) {
  eff.c:1761    0.90 :   405f29: cmp    %r15d,%r12d

After:

     0.00 :   405ef1: inc    %r15
          : 1812   tmpsd * (TD + tmpsd * TDD)));
     0.01 :   405ef4: vfmadd213sd 0x2b9b3(%rip),%xmm0,%xmm3        # 4318b0 <_IO_stdin_used+0x8b0>
          : 1811   tmpsd * (TC +
     0.67 :   405efd: vfmadd213sd 0x2b9b2(%rip),%xmm0,%xmm3        # 4318b8 <_IO_stdin_used+0x8b8> // eff.c:1811
          : 1810   TA + tmpsd * (TB +
     0.35 :   405f06: vfmadd213sd 0x2b9b1(%rip),%xmm0,%xmm3        # 4318c0 <_IO_stdin_used+0x8c0>
          : 1809   dumbo =
     1.41 :   405f0f: vfmadd213sd 0x2b9b0(%rip),%xmm0,%xmm3        # 4318c8 <_IO_stdin_used+0x8c8> // eff.c:1809
          : 1813   sumi -= sj * tmpsd * dij2i * dumbo;
     2.58 :   405f18: vmulsd %xmm3,%xmm0,%xmm0 // eff.c:1813
     2.81 :   405f1c: vfnmadd213sd 0x30(%rsp),%xmm1,%xmm0
     3.78 :   405f23: vmovsd %xmm0,0x30(%rsp)
          : 1761   for (k = 0; k < lpears[i] + upears[i]; k++) {

Where e.g. '// eff.c:1811' shares the same color as the percentantage
at the line beginning.

Signed-off-by: Martin Liška <mliska@suse.cz>
Link: http://lore.kernel.org/lkml/a0d53f31-f633-5013-c386-a4452391b081@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-04-20 08:40:20 -03:00
Martin Liska
2777b81b37 perf annotate: Show full source location with 'l' hotkey
Right now, when Line numbers are displayed, one can't easily find a
source file that the line corresponds to.

When a source line is selected and 'l' is pressed, full source file
location is displayed in perf UI footer line. The hotkey works only for
source code lines.

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lore.kernel.org/lkml/25a6384f-d862-5dda-4fec-8f0555599c75@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Martin Liska
44e176501c perf config: Add annotate.demangle{,_kernel}
Committer notes:

This allows setting this in from the command line:

  $ perf config annotate.demangle
  $ perf config annotate.demangle=yes
  $ perf config annotate.demangle
  annotate.demangle=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  $
  $
  $ perf config annotate.demangle_kernel
  $ perf config annotate.demangle_kernel=yes
  $ perf config annotate.demangle_kernel
  annotate.demangle_kernel=yes
  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
  	sort-order = srcline
  [annotate]
  	demangle = yes
  	demangle_kernel = yes
  $

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/c96aabe7-791f-9503-295f-3147a9d19b60@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-06 16:42:31 -03:00
Martin Liška
1f0e6edcd9 perf annotate: Fix jump parsing for C++ code.
Considering the following testcase:

  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }

'perf annotate' displays:

  86.52 │40055e: → ja   40056c <foo(int, int)+0x26>
  13.37 │400560:   mov  -0x18(%rbp),%eax
        │400563:   add  %eax,-0x14(%rbp)
        │400566:   addl $0x1,-0x4(%rbp)
   0.11 │40056a: → jmp  400557 <foo(int, int)+0x11>
        │40056c:   mov  -0x14(%rbp),%eax
        │40056f:   pop  %rbp

and the 'ja 40056c' does not link to the location in the function.  It's
caused by fact that comma is wrongly parsed, it's part of function
signature.

With my patch I see:

  86.52 │   ┌──ja   26
  13.37 │   │  mov  -0x18(%rbp),%eax
        │   │  add  %eax,-0x14(%rbp)
        │   │  addl $0x1,-0x4(%rbp)
   0.11 │   │↑ jmp  11
        │26:└─→mov  -0x14(%rbp),%eax

and 'o' output prints:

  86.52 │4005┌── ↓ ja   40056c <foo(int, int)+0x26>
  13.37 │4005│0:   mov  -0x18(%rbp),%eax
        │4005│3:   add  %eax,-0x14(%rbp)
        │4005│6:   addl $0x1,-0x4(%rbp)
   0.11 │4005│a: ↑ jmp  400557 <foo(int, int)+0x11>
        │4005└─→   mov  -0x14(%rbp),%eax

On the contrary, compiling the very same file with gcc -x c, the parsing
is fine because function arguments are not displayed:

  jmp  400543 <foo+0x1d>

Committer testing:

Before:

  $ cat cpp_args_annotate.c
  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }
  $ gcc --version |& head -1
  gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
  $ gcc -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo>:
              foo():
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
              ↓ jmp  1d
              a += b;
   13.45  13:   mov  -0x18(%rbp),%eax
                add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.09  1d:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.46      ↑ jbe  13
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

I.e. works for C, now lets switch to C++:

  $ g++ -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
                cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      → ja   40112c <foo(int, int)+0x26>
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      → jmp  401117 <foo(int, int)+0x11>
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

Reproduced.

Now with this patch:

Reusing the C++ built binary, as we can see here:

  $ readelf -wi cpp_args_annotate | grep producer
    <c>   DW_AT_producer    : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g
  $

And furthermore:

  $ file cpp_args_annotate
  cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped
  $ perf buildid-list -i cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9
  $ perf buildid-list | grep cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate
  $

It now works:

  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
          11:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      ↓ ja   26
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      ↑ jmp  11
              return a;
          26:   mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-11 18:23:53 -03:00
Arnaldo Carvalho de Melo
20e88c6076 perf annotate: Move bpf header inclusion to inside HAVE_LIBBPF_SUPPORT
No need to include it otherwise.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:41 -03:00
Dengcheng Zhu
a701d28e2d perf annotate mips: Add perf arch instructions annotate handlers
Support the MIPS architecture using the ins_ops association method. With
this patch, perf-annotate can work well on MIPS.

Testing it with a perf.data file collected on a mips machine:

$./perf annotate -i perf.data

         :           Disassembly of section .text:
         :
         :           00000000000be6a0 <get_next_seq>:
         :           get_next_seq():
    0.00 :   be6a0:       lw      v0,0(a0)
    0.00 :   be6a4:       daddiu  sp,sp,-128
    0.00 :   be6a8:       ld      a7,72(a0)
    0.00 :   be6ac:       gssq    s5,s4,80(sp)
    0.00 :   be6b0:       gssq    s1,s0,48(sp)
    0.00 :   be6b4:       gssq    s8,gp,112(sp)
    0.00 :   be6b8:       gssq    s7,s6,96(sp)
    0.00 :   be6bc:       gssq    s3,s2,64(sp)
    0.00 :   be6c0:       sd      a3,0(sp)
    0.00 :   be6c4:       move    s0,a0
    0.00 :   be6c8:       sd      v0,32(sp)
    0.00 :   be6cc:       sd      a5,8(sp)
    0.00 :   be6d0:       sd      zero,8(a0)
    0.00 :   be6d4:       sd      a6,16(sp)
    0.00 :   be6d8:       ld      s2,48(a0)
    8.53 :   be6dc:       ld      s1,40(a0)
    9.42 :   be6e0:       ld      v1,32(a0)
    0.00 :   be6e4:       nop
    0.00 :   be6e8:       ld      s4,24(a0)
    0.00 :   be6ec:       ld      s5,16(a0)
    0.00 :   be6f0:       sd      a7,40(sp)
   10.11 :   be6f4:       ld      s6,64(a0)

...

The original patch link:
https://lore.kernel.org/patchwork/patch/1180480/

Signed-off-by: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Dengcheng Zhu <dzhu@wavecomp.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Cc: linux-mips@vger.kernel.org
[ fanpeng@loongson.cn: Add missing "bgtzl", "bltzl", "bgezl", "blezl", "beql" and "bnel" for pre-R6processors ]
Signed-off-by: Peng Fan <fanpeng@loongson.cn>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-04 09:42:40 -03:00
Jiri Olsa
bf5411695a perf tools: Pass build_id object to build_id__sprintf()
Passing build_id object to build_id__sprintf function, so it can operate
with the proper size of build id.

This will create proper md5 build id readable names,
like following:

  a50e350e97c43b4708d09bcd85ebfff7

instead of:

  a50e350e97c43b4708d09bcd85ebfff700000000

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:46:22 -03:00
Jiri Olsa
0aba7f036a perf tools: Use build_id object in dso
Replace build_id byte array with struct build_id object and all the code
that references it.

The objective is to carry size together with build id array, so it's
better to keep both together.

This is preparatory change for following patches, and there's no
functional change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:44:47 -03:00
Arnaldo Carvalho de Melo
bbe544682e perf annotate: Allow configuring the 'disassembler_style' knob via 'perf config'
# perf annotate --stdio2 acpi_processor_ffh_cstate_enter > default
  # perf config annotate.disassembler_style=intel
  # perf config annotate.disassembler_style
  annotate.disassembler_style=intel
  # perf annotate --stdio2 acpi_processor_ffh_cstate_enter > intel
  # diff -u default intel
  --- default	2020-09-04 13:09:26.019205732 -0300
  +++ intel	2020-09-04 13:09:52.823795081 -0300
  @@ -1,42 +1,42 @@
   Samples: 1K of event 'cycles', 4000 Hz, Event count (approx.): 990065316, [percent: local period]
   acpi_processor_ffh_cstate_enter() /lib/modules/5.9.0-rc3/build/vmlinux
  -Percent     → callq   __fentry__
  -              mov     cpu_number,%edx
  -              mov     %edx,%edx
  -              mov     cpu_cstate_entry,%rax
  -              add     -0x7dbe9700(,%rdx,8),%rax
  -              movzbl  0x9(%rdi),%edx
  -              mov     0x4(%rax,%rdx,8),%edi
  -              mov     (%rax,%rdx,8),%esi
  -            → jmpq    137ccc6
  -        2d: → jmpq    137ccd8
  +Percent     → call    __fentry__
  +              mov     edx,DWORD PTR gs:[rip+0x7e541d74]
  +              mov     edx,edx
  +              mov     rax,QWORD PTR [rip+0x152b8fb]
  +              add     rax,QWORD PTR [rdx*8-0x7dbe9700]
  +              movzx   edx,BYTE PTR [rdi+0x9]
  +              mov     edi,DWORD PTR [rax+rdx*8+0x4]
  +              mov     esi,DWORD PTR [rax+rdx*8]
  +            → jmp     137ccc6
  +        2d: → jmp     137ccd8
                 mfence
  -              mov     %gs:0x17bc0,%rax
  -              clflush (%rax)
  +              mov     rax,QWORD PTR gs:0x17bc0
  +              clflush BYTE PTR [rax]
                 mfence
  -              xor     %edx,%edx
  -              mov     %rdx,%rcx
  -              mov     %gs:0x17bc0,%rax
  -  0.00        monitor %rax,%ecx,%edx
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +              xor     edx,edx
  +              mov     rcx,rdx
  +              mov     rax,QWORD PTR gs:0x17bc0
  +  0.00        monitor
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↓ jne     71
  -            ↓ jmpq    68
  -              verw    0x538b08(%rip)        # ffffffff82008150 <ds.0>
  -        68:   mov     %rsi,%rax
  -              mov     %rdi,%rcx
  -100.00        mwait   %eax,%ecx
  -        71:   mov     %gs:0x17bc0,%rax
  -              lock    andb    $0xdf,0x2(%rax)
  -              lock    addl    $0x0,-0x4(%rsp)
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +            ↓ jmp     68
  +              verw    WORD PTR [rip+0x538b08]        # ffffffff82008150 <ds.0>
  +        68:   mov     rax,rsi
  +              mov     rcx,rdi
  +100.00        mwait
  +        71:   mov     rax,QWORD PTR gs:0x17bc0
  +              lock    and     BYTE PTR [rax+0x2],0xdf
  +              lock    add     DWORD PTR [rsp-0x4],0x0
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↓ je      97
  -              andl    $0x7fffffff,__preempt_count
  -        97: ← retq
  -              mov     %gs:0x17bc0,%rax
  -              lock    orb     $0x20,0x2(%rax)
  -              mov     (%rax),%rax
  -              test    $0x8,%al
  +              and     DWORD PTR gs:[rip+0x7e548509],0x7fffffff
  +        97:   ret
  +              mov     rax,QWORD PTR gs:0x17bc0
  +              lock    or      BYTE PTR [rax+0x2],0x20
  +              mov     rax,QWORD PTR [rax]
  +              test    al,0x8
               ↑ jne     71
  -            ↑ jmpq    2d
  +            ↑ jmp     2d
  #

Requested-by: Matt P. Dziubinski <matdzb@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-04 16:07:23 -03:00
Numfor Mbiziwo-Tiapo
b39730a663 perf annotate: Fix non-null terminated buffer returned by readlink()
Our local MSAN (Memory Sanitizer) build of perf throws a warning that
comes from the "dso__disassemble_filename" function in
"tools/perf/util/annotate.c" when running perf record.

The warning stems from the call to readlink, in which "build_id_path"
was being read into "linkname". Since readlink does not null terminate,
an uninitialized memory access would later occur when "linkname" is
passed into the strstr function. This is simply fixed by
null-terminating "linkname" after the call to readlink.

To reproduce this warning, build perf by running:

  $ make -C tools/perf CLANG=1 CC=clang EXTRA_CFLAGS="-fsanitize=memory -fsanitize-memory-track-origins"

(Additionally, llvm might have to be installed and clang might have to
be specified as the compiler - export CC=/usr/bin/clang)

Then running:

  tools/perf/perf record -o - ls / | tools/perf/perf --no-pager annotate -i - --stdio

Please see the cover letter for why false positive warnings may be
generated.

Signed-off-by: Numfor Mbiziwo-Tiapo <nums@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Drayton <mbd@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20190729205750.193289-1-nums@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-07-09 12:36:50 -03:00
Tiezhu Yang
3e9b26dc22 perf tools: Remove some duplicated includes
There exists some duplicated includes in tools/perf, remove them.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xuefeng li <lixuefeng@loongson.cn>
Link: http://lore.kernel.org/lkml/1591071304-19338-2-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-06-02 11:09:41 -03:00
Arnaldo Carvalho de Melo
6e6d1d654e perf evsel: Rename perf_evsel__env() to evsel__env()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
c754c382c9 perf evsel: Rename perf_evsel__is_*() to evsel__is*()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:31 -03:00
Arnaldo Carvalho de Melo
347c751a64 perf evsel: Rename perf_evsel__group_desc() to evsel__group_desc()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka
libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Arnaldo Carvalho de Melo
8ab2e96d8f perf evsel: Rename *perf_evsel__*name() to *evsel__*name()
As they are 'struct evsel' methods or related routines, not part of
tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-05-05 16:35:30 -03:00
Jiri Olsa
3c29d4483e perf annotate: Add basic support for bpf_image
Add the DSO_BINARY_TYPE__BPF_IMAGE dso binary type to recognize BPF
images that carry trampoline or dispatcher.

Upcoming patches will add support to read the image data, store it
within the BPF feature in perf.data and display it for annotation
purposes.

Currently we only display following message:

  # ./perf annotate bpf_trampoline_24456 --stdio
   Percent |      Source code & Disassembly of . for cycles (504  ...
  --------------------------------------------------------------- ...
           :       to be implemented

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16 12:19:06 -03:00
Ravi Bangoria
dabce16bd2 perf annotate: Get rid of annotation->nr_jumps
The 'nr_jumps' field in 'struct annotation' is not used since it's
inception in commit 2402e4a936 ("perf annotate browser: Show 'jumpy'
functions").  Get rid of it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-7-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-04 10:34:10 -03:00
Ravi Bangoria
e0560ba6d9 perf annotate: Fix segfault with source toggle
While rendering annotate browser from perf report tui, we keep track
of total number of lines(asm + source) in annotation->nr_entries and
total number of asm lines in annotation->nr_asm_entries. But we don't
reset them before starting. Thus if user annotates same function
multiple times, we restart incrementing these fields with old values.

This causes a segfault when user tries to toggle source code after
annotating same function multiple times. Fix it.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-5-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:47:23 -03:00
Ravi Bangoria
d3c03147bf perf annotate: Align struct annotate_args
Align fields of struct annotate_args.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:47:23 -03:00
Ravi Bangoria
2316f861ae perf annotate: Simplify disasm_line allocation and freeing code
We are allocating disasm_line object in annotation_line__new() instead
of disasm_line__new(). Similarly annotation_line__delete() is actually
freeing disasm_line object as well. This complexity is because of
privsize.  But we don't need privsize anymore so get rid of privsize and
simplify disasm_line allocation and freeing code.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-3-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:07:13 -03:00
Ravi Bangoria
e0ad4d6854 perf annotate: Remove privsize from symbol__annotate() args
privsize is passed as 0 from all the symbol__annotate() callers.
Remove it from argument list.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20200204045233.474937-2-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 11:06:14 -03:00
Ravi Bangoria
7384083ba6 perf annotate: Make perf config effective
perf default config set by user in [annotate] section is totally ignored
by annotate code. Fix it.

Before:

  $ ./perf config
  annotate.hide_src_code=true
  annotate.show_nr_jumps=true
  annotate.show_nr_samples=true

  $ ./perf annotate shash
         │    unsigned h = 0;
         │      movl   $0x0,-0xc(%rbp)
         │    while (*s)
         │    ↓ jmp    44
         │    h = 65599 * h + *s++;
   11.33 │24:   mov    -0xc(%rbp),%eax
   43.50 │      imul   $0x1003f,%eax,%ecx
         │      mov    -0x18(%rbp),%rax

After:

         │        movl   $0x0,-0xc(%rbp)
         │      ↓ jmp    44
       1 │1 24:   mov    -0xc(%rbp),%eax
       4 │        imul   $0x1003f,%eax,%ecx
         │        mov    -0x18(%rbp),%rax

Note that we have removed show_nr_samples and show_total_period from
annotation_options because they are not used. Instead of them we use
symbol_conf.show_nr_samples and symbol_conf.show_total_period.

Committer testing:

Using 'perf annotate --stdio2' to use the TUI rendering but emitting the output to stdio:

  # perf config
  #
  # perf config annotate.hide_src_code=true
  # perf config
  annotate.hide_src_code=true
  #
  # perf config annotate.show_nr_jumps=true
  # perf config annotate.show_nr_samples=true
  # perf config
  annotate.hide_src_code=true
  annotate.show_nr_jumps=true
  annotate.show_nr_samples=true
  #
  #

Before:

  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Percent
              00000000000609f0 <ObjectInstance::weak_pointer_was_finalized()@@Base>:
                endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
  100.00  10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
          1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
          20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #

After:

  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Samples       endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
     1  1 10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
        1 1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
        1 20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #
  # perf config annotate.show_nr_jumps
  annotate.show_nr_jumps=true
  # perf config annotate.show_nr_jumps=false
  # perf config annotate.show_nr_jumps
  annotate.show_nr_jumps=false
  #
  # perf annotate --stdio2 ObjectInstance::weak_pointer_was_finalized 2> /dev/null
  Samples: 1  of event 'cycles', 4000 Hz, Event count (approx.): 830873, [percent: local period]
  ObjectInstance::weak_pointer_was_finalized() /usr/lib64/libgjs.so.0.0.0
  Samples       endbr64
                cmpq    $0x0,0x20(%rdi)
              ↓ je      10
                xor     %eax,%eax
              ← retq
                xchg    %ax,%ax
       1  10:   push    %rbp
                cmpq    $0x0,0x18(%rdi)
                mov     %rdi,%rbp
              ↓ jne     20
          1b:   xor     %eax,%eax
                pop     %rbp
              ← retq
                nop
          20:   lea     0x18(%rdi),%rdi
              → callq   JS_UpdateWeakPointerAfterGC(JS::Heap<JSObject*
                cmpq    $0x0,0x18(%rbp)
              ↑ jne     1b
                mov     %rbp,%rdi
              → callq   ObjectBase::jsobj_addr() const@plt
                mov     $0x1,%eax
                pop     %rbp
              ← retq
  #

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-6-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:59 -03:00
Ravi Bangoria
46ccb44269 perf annotate: Fix --show-nr-samples for tui/stdio2
perf annotate --show-nr-samples does not really show number of samples.

The reason is we have two separate variables for the same purpose.

One is in symbol_conf.show_nr_samples and another is
annotation_options.show_nr_samples.

We save command line option in symbol_conf.show_nr_samples but uses
annotation_option.show_nr_samples while rendering tui/stdio2 browser.

Though, we copy symbol_conf.show_nr_samples to
annotation__default_options.show_nr_samples but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().

Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_nr_samples is used by perf report/top as well. So
let's kill annotation_options.show_nr_samples.

On a side note, I've kept annotation_options.show_nr_samples definition
because it's still used by perf-config code. Follow up patch to fix
perf-config for annotate will remove annotation_options.show_nr_samples.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:48 -03:00
Ravi Bangoria
68aac855b6 perf annotate: Fix --show-total-period for tui/stdio2
perf annotate --show-total-period does not really show total period.

The reason is we have two separate variables for the same purpose.

One is in symbol_conf.show_total_period and another is
annotation_options.show_total_period.

We save command line option in symbol_conf.show_total_period but uses
annotation_option.show_total_period while rendering tui/stdio2 browser.

Though, we copy symbol_conf.show_total_period to
annotation__default_options.show_total_period but that is not really
effective as we don't use annotation__default_options once we copy
default options to dynamic variable annotate.opts in cmd_annotate().

Instead of all these complication, keep only one variable and use it all
over. symbol_conf.show_total_period is used by perf report/top as well.
So let's kill annotation_options.show_total_period.

On a side note, I've kept annotation_options.show_total_period
definition because it's still used by perf-config code. Follow up patch
to fix perf-config for annotate will remove
annotation_options.show_total_period.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Link: http://lore.kernel.org/lkml/20200213064306.160480-3-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-27 10:44:40 -03:00
Andi Kleen
3b0b16bf8c perf tools: Support --prefix/--prefix-strip
The objdump utility has useful --prefix / --prefix-strip options to
allow changing source code file names hardcoded into executables' debug
info. Add options to 'perf report', 'perf top' and 'perf annotate',
which are then passed to objdump.

  $ mkdir foo
  $ echo 'main() { for (;;); }' > foo/foo.c
  $ gcc -g foo/foo.c
  foo/foo.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
      1 | main() { for (;;); }
        | ^~~~
  $ perf record ./a.out
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.230 MB perf.data (5721 samples) ]
  $ mv foo bar
  $ perf annotate
  <does not show source code>
  $ perf annotate --prefix=/home/ak/lsrc/git/bar --prefix-strip=5
  <does show source code>

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
LPU-Reference: 20200107210444.214071-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-01-14 12:02:19 -03:00
Arnaldo Carvalho de Melo
c54d241b35 perf maps: Rename map_groups.h to maps.h
One more step in the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9ibtn3vua76f934t7woyf26w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
f2eaea09d6 perf map_symbol: Rename ms->mg to ms->maps
One more step on the merge of 'struct maps' with 'struct map_groups'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-61rra2wg392rhvdgw421wzpt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
79b6bb73f8 perf maps: Merge 'struct maps' with 'struct map_groups'
And pick the shortest name: 'struct maps'.

The split existed because we used to have two groups of maps, one for
functions and one for variables, but that only complicated things,
sometimes we needed to figure out what was at some address and then had
to first try it on the functions group and if that failed, fall back to
the variables one.

That split is long gone, so for quite a while we had only one struct
maps per struct map_groups, simplify things by combining those structs.

First patch is the minimum needed to merge both, follow up patches will
rename 'thread->mg' to 'thread->maps', etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26 11:07:46 -03:00
Arnaldo Carvalho de Melo
94e44b9ca5 perf annotate: Stop using map->groups, use map_symbol->mg instead
These were the last uses of map->groups, next cset will nuke it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-n3g0foos7l7uxq9nar0zo0vj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
d46a4cdf49 pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol'
So that we pass that substructure around and with it consolidate lots of
functions that receive a (map, symbol) pair and now can receive just a
'struct map_symbol' pointer.

This further paves the way to add 'struct map_groups' to 'struct
map_symbol' so that we can have all we need for annotation so that we
can ditch 'struct map'->groups, i.e. have the map_groups pointer in a
more central place, avoiding the pointer in the 'struct map' that have
tons of instances.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
2975489458 perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers
We are already passing things like:

  symbol__annotate(ms->sym, ms->map, ...)

So shorten the signature of such functions to receive the 'map_symbol'
pointer.

This also paves the way to having the 'struct map_groups' pointer in the
'struct map_symbol' so that we can get rid of 'struct map'->groups.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
9d355b381b perf map_groups: Pass the object to map_groups__find_ams()
We were just passing a map to look for and reuse its map->groups member,
but the idea is that this is going away, as a map can be in multiple
rb_trees when being reused via a map_node, so do as all the other
map_groups methods and pass as its first arg the object being operated
on.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-nmi2pbggqloogwl6vxrvex5a@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Ian Rogers
5c65b1c084 perf annotate: Fix heap overflow
Fix expand_tabs that copies the source lines '\0' and then appends
another '\0' at a potentially out of bounds address.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20191026035644.217548-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 08:30:18 -03:00
Ingo Molnar
27a0a90d63 perf/core improvements and fixes:
perf trace:
 
 - Add syscall failure stats to -s/--summary and -S/--with-summary, works in
   combination with specifying just a set of syscalls, see below first with
   -s/--summary, then with -S/--with-summary just for the syscalls we saw failing
   with -s:
 
     # perf trace -s sleep 1
 
      Summary of events:
 
      sleep (16218), 80 events, 93.0%
 
        syscall     calls  errors  total      min      avg      max   stddev
                                   (msec)   (msec)   (msec)   (msec)    (%)
        ----------- -----  ------ -------- -------- -------- -------- ------
        nanosleep       1      0  1000.091 1000.091 1000.091 1000.091  0.00%
        mmap            8      0     0.045    0.005    0.006    0.008  7.09%
        mprotect        4      0     0.028    0.005    0.007    0.009 11.38%
        openat          3      0     0.021    0.005    0.007    0.009 14.07%
        munmap          1      0     0.017    0.017    0.017    0.017  0.00%
        brk             4      0     0.010    0.001    0.002    0.004 23.15%
        read            4      0     0.009    0.002    0.002    0.003  8.13%
        close           5      0     0.008    0.001    0.002    0.002 10.83%
        fstat           3      0     0.006    0.002    0.002    0.002  6.97%
        access          1      1     0.006    0.006    0.006    0.006  0.00%
        lseek           3      0     0.005    0.001    0.002    0.002  7.37%
        arch_prctl      2      1     0.004    0.001    0.002    0.002 17.64%
        execve          1      0     0.000    0.000    0.000    0.000  0.00%
 
     # perf trace -e access,arch_prctl -S sleep 1
          0.000 ( 0.006 ms): sleep/19503 arch_prctl(option: 0x3001, arg2: 0x7fff165996b0) = -1 EINVAL (Invalid argument)
          0.024 ( 0.006 ms): sleep/19503 access(filename: 0x2177e510, mode: R)            = -1 ENOENT (No such file or directory)
          0.136 ( 0.002 ms): sleep/19503 arch_prctl(option: SET_FS, arg2: 0x7f9421737580) = 0
 
      Summary of events:
 
      sleep (19503), 6 events, 50.0%
 
        syscall    calls  errors total    min    avg    max  stddev
                                 (msec) (msec) (msec) (msec)    (%)
        ---------- -----  ------ ------ ------ ------ ------ ------
        arch_prctl   2       1    0.008  0.002  0.004  0.006 57.22%
        access       1       1    0.006  0.006  0.006  0.006  0.00%
 
     #
 
   - Introduce --errno-summary, to drill down a bit more in the errno stats:
 
     # perf trace --errno-summary -e access,arch_prctl -S sleep 1
          0.000 ( 0.006 ms): sleep/5587 arch_prctl(option: 0x3001, arg2: 0x7ffd6ba6aa00) = -1 EINVAL (Invalid argument)
          0.028 ( 0.007 ms): sleep/5587 access(filename: 0xb83d9510, mode: R)            = -1 ENOENT (No such file or directory)
          0.172 ( 0.003 ms): sleep/5587 arch_prctl(option: SET_FS, arg2: 0x7f45b8392580) = 0
 
      Summary of events:
 
      sleep (5587), 6 events, 50.0%
 
        syscall    calls  errors total    min    avg    max  stddev
                                 (msec) (msec) (msec) (msec)   (%)
        ---------- -----  ------ ------ ------ ------ ------ ------
        arch_prctl     2     1    0.009  0.003  0.005  0.006 38.90%
 			   EINVAL: 1
        access         1     1    0.007  0.007  0.007  0.007  0.00%
                            ENOENT: 1
     #
 
   - Filter own pid to avoid a feedback look in 'perf trace record -a'
 
   - Add the glue for the auto generated x86 IRQ vector array.
 
   - Show error message when not finding a field used in a filter expression
 
     # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="cnt>32767"
     Failed to set filter "(cnt>32767) && (common_pid != 19938 && common_pid != 8922)" on event syscalls:sys_enter_write with 22 (Invalid argument)
     #
     # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="count>32767"
          0.000 python3.5/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dc53600, count: 172086)
         12.641 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db63660, count: 75994)
         27.738 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db4b1e0, count: 41635)
        136.070 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dbab510, count: 62232)
     #
 
   - Add a generator for x86's IRQ vectors -> strings
 
   - Introduce stroul() (string -> number) methods for the strarray and
     strarrays classes, also strtoul_flags, allowing to go from both strings
     and or-ed strings to numbers, allowing things like:
 
     # perf trace -e syscalls:sys_enter_mmap --filter="flags==DENYWRITE|PRIVATE|FIXED" sleep 1
          0.000 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2aa5000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000)
          0.011 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2bf2000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000)
          0.015 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2c3f000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000)
     #
 
   Allowing to narrow down from the complete set of mmap calls for that workload:
 
     # perf trace -e syscalls:sys_enter_mmap sleep 1
          0.000 sleep/22695 syscalls:sys_enter_mmap(len: 134773, prot: READ, flags: PRIVATE, fd: 3)
          0.041 sleep/22695 syscalls:sys_enter_mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS)
          0.053 sleep/22695 syscalls:sys_enter_mmap(len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3)
          0.069 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd23ffb6000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000)
          0.077 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240103000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000)
          0.083 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240150000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000)
          0.095 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240156000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS)
          0.339 sleep/22695 syscalls:sys_enter_mmap(len: 217750512, prot: READ, flags: PRIVATE, fd: 3)
     #
 
   Works with all targets, so, for system wide, looking at who calls mmap with flags set to just "PRIVATE":
 
     # perf trace --max-events=5 -e syscalls:sys_enter_mmap --filter="flags==PRIVATE"
          0.000 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
          0.050 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
          0.062 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
          0.145 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18)
          0.183 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18)
     #
 
   # perf trace --max-events=2 -e syscalls:sys_enter_lseek --filter="whence==SET && offset != 0"
          0.000 Cache2 I/O/12047 syscalls:sys_enter_lseek(fd: 277, offset: 43, whence: SET)
       1142.070 mozStorage #5/12302 syscalls:sys_enter_lseek(fd: 44</home/acme/.mozilla/firefox/ina67tev.default/cookies.sqlite-wal>, offset: 393536, whence: SET)
   #
 
 perf annotate:
 
   - Fix objdump --no-show-raw-insn flag to work with goth gcc and clang.
 
   - Streamline objdump execution, preserving the right error codes for better
     reporting to user.
 
 perf report:
 
   - Add warning when libunwind not compiled in.
 
 perf stat:
 
   Jin Yao:
 
   - Support --all-kernel/--all-user, to match options available in 'perf record',
     asking that all the events specified work just with kernel or user events.
 
 perf list:
 
   Jin Yao:
 
   - Hide deprecated events by default, allow showing them with --deprecated.
 
 libbperf:
 
   Jiri Olsa:
 
   - Allow to build with -ltcmalloc.
 
   - Finish mmap interface, getting more stuff from tools/perf while adding
     abstractions to avoid pulling too much stuff, to get libperf to grow as
     tools needs things like auxtrace, etc.
 
 perf scripting engines:
 
   Steven Rostedt (VMware):
 
   - Iterate on tep event arrays directly, fixing script generation with
     '-g python' when having multiple tracepoints in a perf.data file.
 
 core:
 
   - Allow to build with -ltcmalloc.
 
 perf test:
 
   Leo Yan:
 
   - Report failure for mmap events.
 
   - Avoid infinite loop for task exit case.
 
   - Remove needless headers for bp_account test.
 
   - Add dedicated checking helper is_supported().
 
   - Disable bp_signal testing for arm64.
 
 Vendor events:
 
 arm64:
 
   John Garry:
 
   - Fix Hisi hip08 DDRC PMU eventname.
 
   - Add some missing events for Hisi hip08 DDRC, L3C and HHA PMUs.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXa2zPgAKCRCyPKLppCJ+
 J8qoAP9jm84Aoq87j/xh9wl3JeU3aeRXq4V6zpGbtt9u41OmRwD9E8CQIcLDAuNp
 IQaFYgHydH4OfZw3+rTJJjmJ/eb0IQg=
 =TDUz
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.5-20191021' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf trace:

- Add syscall failure stats to -s/--summary and -S/--with-summary, works in
  combination with specifying just a set of syscalls, see below first with
  -s/--summary, then with -S/--with-summary just for the syscalls we saw failing
  with -s:

    # perf trace -s sleep 1

     Summary of events:

     sleep (16218), 80 events, 93.0%

       syscall     calls  errors  total      min      avg      max   stddev
                                  (msec)   (msec)   (msec)   (msec)    (%)
       ----------- -----  ------ -------- -------- -------- -------- ------
       nanosleep       1      0  1000.091 1000.091 1000.091 1000.091  0.00%
       mmap            8      0     0.045    0.005    0.006    0.008  7.09%
       mprotect        4      0     0.028    0.005    0.007    0.009 11.38%
       openat          3      0     0.021    0.005    0.007    0.009 14.07%
       munmap          1      0     0.017    0.017    0.017    0.017  0.00%
       brk             4      0     0.010    0.001    0.002    0.004 23.15%
       read            4      0     0.009    0.002    0.002    0.003  8.13%
       close           5      0     0.008    0.001    0.002    0.002 10.83%
       fstat           3      0     0.006    0.002    0.002    0.002  6.97%
       access          1      1     0.006    0.006    0.006    0.006  0.00%
       lseek           3      0     0.005    0.001    0.002    0.002  7.37%
       arch_prctl      2      1     0.004    0.001    0.002    0.002 17.64%
       execve          1      0     0.000    0.000    0.000    0.000  0.00%

    # perf trace -e access,arch_prctl -S sleep 1
         0.000 ( 0.006 ms): sleep/19503 arch_prctl(option: 0x3001, arg2: 0x7fff165996b0) = -1 EINVAL (Invalid argument)
         0.024 ( 0.006 ms): sleep/19503 access(filename: 0x2177e510, mode: R)            = -1 ENOENT (No such file or directory)
         0.136 ( 0.002 ms): sleep/19503 arch_prctl(option: SET_FS, arg2: 0x7f9421737580) = 0

     Summary of events:

     sleep (19503), 6 events, 50.0%

       syscall    calls  errors total    min    avg    max  stddev
                                (msec) (msec) (msec) (msec)    (%)
       ---------- -----  ------ ------ ------ ------ ------ ------
       arch_prctl   2       1    0.008  0.002  0.004  0.006 57.22%
       access       1       1    0.006  0.006  0.006  0.006  0.00%

    #

  - Introduce --errno-summary, to drill down a bit more in the errno stats:

    # perf trace --errno-summary -e access,arch_prctl -S sleep 1
         0.000 ( 0.006 ms): sleep/5587 arch_prctl(option: 0x3001, arg2: 0x7ffd6ba6aa00) = -1 EINVAL (Invalid argument)
         0.028 ( 0.007 ms): sleep/5587 access(filename: 0xb83d9510, mode: R)            = -1 ENOENT (No such file or directory)
         0.172 ( 0.003 ms): sleep/5587 arch_prctl(option: SET_FS, arg2: 0x7f45b8392580) = 0

     Summary of events:

     sleep (5587), 6 events, 50.0%

       syscall    calls  errors total    min    avg    max  stddev
                                (msec) (msec) (msec) (msec)   (%)
       ---------- -----  ------ ------ ------ ------ ------ ------
       arch_prctl     2     1    0.009  0.003  0.005  0.006 38.90%
			   EINVAL: 1
       access         1     1    0.007  0.007  0.007  0.007  0.00%
                           ENOENT: 1
    #

  - Filter own pid to avoid a feedback look in 'perf trace record -a'

  - Add the glue for the auto generated x86 IRQ vector array.

  - Show error message when not finding a field used in a filter expression

    # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="cnt>32767"
    Failed to set filter "(cnt>32767) && (common_pid != 19938 && common_pid != 8922)" on event syscalls:sys_enter_write with 22 (Invalid argument)
    #
    # perf trace --max-events=4 -e syscalls:sys_enter_write --filter="count>32767"
         0.000 python3.5/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dc53600, count: 172086)
        12.641 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db63660, count: 75994)
        27.738 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0db4b1e0, count: 41635)
       136.070 python3.5.post/17535 syscalls:sys_enter_write(fd: 3, buf: 0x564b0dbab510, count: 62232)
    #

  - Add a generator for x86's IRQ vectors -> strings

  - Introduce stroul() (string -> number) methods for the strarray and
    strarrays classes, also strtoul_flags, allowing to go from both strings
    and or-ed strings to numbers, allowing things like:

    # perf trace -e syscalls:sys_enter_mmap --filter="flags==DENYWRITE|PRIVATE|FIXED" sleep 1
         0.000 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2aa5000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000)
         0.011 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2bf2000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000)
         0.015 sleep/22588 syscalls:sys_enter_mmap(addr: 0x7f42d2c3f000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000)
    #

  Allowing to narrow down from the complete set of mmap calls for that workload:

    # perf trace -e syscalls:sys_enter_mmap sleep 1
         0.000 sleep/22695 syscalls:sys_enter_mmap(len: 134773, prot: READ, flags: PRIVATE, fd: 3)
         0.041 sleep/22695 syscalls:sys_enter_mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS)
         0.053 sleep/22695 syscalls:sys_enter_mmap(len: 1857472, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3)
         0.069 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd23ffb6000, len: 1363968, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x22000)
         0.077 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240103000, len: 311296, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x16f000)
         0.083 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240150000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bb000)
         0.095 sleep/22695 syscalls:sys_enter_mmap(addr: 0x7fd240156000, len: 14272, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS)
         0.339 sleep/22695 syscalls:sys_enter_mmap(len: 217750512, prot: READ, flags: PRIVATE, fd: 3)
    #

  Works with all targets, so, for system wide, looking at who calls mmap with flags set to just "PRIVATE":

    # perf trace --max-events=5 -e syscalls:sys_enter_mmap --filter="flags==PRIVATE"
         0.000 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
         0.050 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
         0.062 pool/2242 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 14)
         0.145 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18)
         0.183 goa-identity-s/2240 syscalls:sys_enter_mmap(len: 756, prot: READ, flags: PRIVATE, fd: 18)
    #

  # perf trace --max-events=2 -e syscalls:sys_enter_lseek --filter="whence==SET && offset != 0"
         0.000 Cache2 I/O/12047 syscalls:sys_enter_lseek(fd: 277, offset: 43, whence: SET)
      1142.070 mozStorage #5/12302 syscalls:sys_enter_lseek(fd: 44</home/acme/.mozilla/firefox/ina67tev.default/cookies.sqlite-wal>, offset: 393536, whence: SET)
  #

perf annotate:

  - Fix objdump --no-show-raw-insn flag to work with goth gcc and clang.

  - Streamline objdump execution, preserving the right error codes for better
    reporting to user.

perf report:

  - Add warning when libunwind not compiled in.

perf stat:

  Jin Yao:

  - Support --all-kernel/--all-user, to match options available in 'perf record',
    asking that all the events specified work just with kernel or user events.

perf list:

  Jin Yao:

  - Hide deprecated events by default, allow showing them with --deprecated.

libbperf:

  Jiri Olsa:

  - Allow to build with -ltcmalloc.

  - Finish mmap interface, getting more stuff from tools/perf while adding
    abstractions to avoid pulling too much stuff, to get libperf to grow as
    tools needs things like auxtrace, etc.

perf scripting engines:

  Steven Rostedt (VMware):

  - Iterate on tep event arrays directly, fixing script generation with
    '-g python' when having multiple tracepoints in a perf.data file.

core:

  - Allow to build with -ltcmalloc.

perf test:

  Leo Yan:

  - Report failure for mmap events.

  - Avoid infinite loop for task exit case.

  - Remove needless headers for bp_account test.

  - Add dedicated checking helper is_supported().

  - Disable bp_signal testing for arm64.

Vendor events:

arm64:

  John Garry:

  - Fix Hisi hip08 DDRC PMU eventname.

  - Add some missing events for Hisi hip08 DDRC, L3C and HHA PMUs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 01:15:45 +02:00
Ingo Molnar
aa7a7b7297 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22 01:15:32 +02:00
Gustavo A. R. Silva
f948eb45e3 perf annotate: Fix multiple memory and file descriptor leaks
Store SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF in variable *ret*, instead
of returning in the middle of the function and leaking multiple
resources: prog_linfo, btf, s and bfdf.

Addresses-Coverity-ID: 1454832 ("Structurally dead code")
Fixes: 11aad897f6 ("perf annotate: Don't return -1 for error when doing BPF disassembly")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191014171047.GA30850@embeddedor
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 12:00:01 -03:00
Ian Rogers
c5baf90892 perf annotate: Fix objdump --no-show-raw-insn flag
Remove redirection of objdump's stderr to /dev/null to help diagnose
failures.

Fix the '--no-show-raw' flag to be '--no-show-raw-insn' which binutils
is permissive and allows, but fails with LLVM objdump.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20191010183649.23768-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:39:42 -03:00
Ian Rogers
b34b45eef1 perf annotate: Don't pipe objdump output through 'expand' command
Avoiding a pipe allows objdump command failures to surface.  Move to the
caller of symbol__parse_objdump_line the call to strim that removes
leading and trailing tabs.  Add a new expand_tabs function that if a tab
is present allocate a new line in which tabs are expanded.  In
symbol__parse_objdump_line the line had no leading spaces, so simplify
the line_ip processing.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20191010183649.23768-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:39:42 -03:00
Ian Rogers
7a675de428 perf annotate: Don't pipe objdump output through 'grep' command
Simplify the objdump command by not piping the output of objdump through
grep. Instead, drop lines that match the grep pattern during the reading
loop.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20191010183649.23768-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:39:42 -03:00
Ian Rogers
4235949944 perf annotate: Use libsubcmd's run-command.h to fork objdump
Reduce duplicated logic by using the subcmd library. Ensure when errors
occur they are reported to the caller. Before this patch, if no lines
are read the error status is 0.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20191010183649.23768-3-irogers@google.com
Link: http://lore.kernel.org/lkml/20191015003418.62563-1-irogers@google.com
[ merged follow up fix for NULL termination as in the 2nd link above ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:39:01 -03:00
Ian Rogers
353dcaa2f9 perf annotate: Avoid reallocation in objdump parsing
Objdump output is parsed using getline which allocates memory for the
read. Getline will realloc if the memory is too small, but currently the
line is always freed after the call.

Simplify parse_objdump_line by performing the reading in symbol__disassemble.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: clang-built-linux@googlegroups.com
Link: http://lore.kernel.org/lkml/20191010183649.23768-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:36:22 -03:00
Jin Yao
cebf7d51a6 perf diff: Report noisy for cycles diff
This patch prints the stddev and hist for the cycles diff of program
block. It can help us to understand if the cycles is noisy or not.

This patch is inspired by Andi Kleen's patch:

  https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

  perf record -b ./div
  perf record -b ./div
  perf diff -c cycles

  # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
  # ........  .......................................................... ....  .................  ............................
  #
      46.72%                                      [div.c:40 -> div.c:40]    0  div                [.] main
      46.72%                                      [div.c:42 -> div.c:44]    0  div                [.] main
      46.72%                                      [div.c:42 -> div.c:39]    0  div                [.] main
      20.54%                          [random_r.c:357 -> random_r.c:394]    1  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -> random_r.c:380]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:388]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:391]    0  libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -> random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -> random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -> random.c:293]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -> random.c:298]    0  libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -> div.c:25]    0  div                [.] compute_flag
       8.40%                                      [div.c:27 -> div.c:28]    0  div                [.] compute_flag
       5.14%                                    [rand.c:26 -> rand.c:27]    0  libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -> rand.c:28]    0  libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -> rand@plt+0]    0  div                [.] rand@plt
       0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -> do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -> do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -> do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

  perf diff -c cycles --cycles-hist

  # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
  # ........  .......................................................... ....  .................  .................  ............................
  #
      46.72%                                      [div.c:40 -> div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
      46.72%                                      [div.c:42 -> div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
      46.72%                                      [div.c:42 -> div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
      20.54%                          [random_r.c:357 -> random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -> random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:388]    0                     libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -> random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -> random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -> random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0                     libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -> random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -> div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
       8.40%                                      [div.c:27 -> div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
       5.14%                                    [rand.c:26 -> rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -> rand.c:28]    0                     libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -> rand@plt+0]    0                     div                [.] rand@plt
       0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -> do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -> do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -> do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr

 v8:
 ---
 Rebase to perf/core branch

 v7:
 ---
 1. v6 got Jiri's ACK.
 2. Rebase to latest perf/core branch.

 v6:
 ---
 1. Jiri provides better code for using data__hpp_register() in ui_init().
    Use this code in v6.

 v5:
 ---
 1. Refine the use of data__hpp_register() in ui_init() according to
    Jiri's suggestion.

 v4:
 ---
 1. Rename the new option from '--noisy' to '--cycles-hist'
 2. Remove the option '-n'.
 3. Only update the spark value and stats when '--cycles-hist' is enabled.
 4. Remove the code of printing '..'.

 v3:
 ---
 1. Move the histogram to a separate column
 2. Move the svals[] out of struct stats

 v2:
 ---
 Jiri got a compile error,

  CC       builtin-diff.o
  builtin-diff.c: In function ‘compute_cycles_diff’:
  builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
  712 |          labs(pair->block_info->cycles_spark[i] -
      |          ^~~~

 Because the result of u64 - u64 is still u64. Now we change the type of
 cycles_spark[] to s64.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-11 10:57:00 -03:00
Arnaldo Carvalho de Melo
11aad897f6 perf annotate: Don't return -1 for error when doing BPF disassembly
Return errno when open_memstream() fails and add two new speciall error
codes for when an invalid, non BPF file or one without BTF is passed to
symbol__disassemble_bpf(), so that its callers can rely on
symbol__strerror_disassemble() to convert that to a human readable error
message that can help figure out what is wrong, with hints even.

Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-usevw9r2gcipfcrbpaueurw0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:30:06 -03:00
Arnaldo Carvalho de Melo
16ed3c1e91 perf annotate: Return appropriate error code for allocation failures
We should return errno or the annotation extra range understood by
symbol__strerror_disassemble() instead of -1, fix it, returning ENOMEM
instead.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-8of1cmj3rz0mppfcshc9bbqq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:30:04 -03:00
Arnaldo Carvalho de Melo
42d7a9107d perf annotate: Fix arch specific ->init() failure errors
They are called from symbol__annotate() and to propagate errors that can
help understand the problem make them return what
symbol__strerror_disassemble() known, i.e. errno codes and other
annotation specific errors in a special, out of errnos, range.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-pqx7srcv7tixgid251aeboj6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:30:03 -03:00
Arnaldo Carvalho de Melo
211f493b61 perf annotate: Propagate the symbol__annotate() error return
We were just returning -1 in symbol__annotate() when symbol__annotate()
failed, propagate its error as it is used later to pass to
symbol__strerror_disassemble() to present a error message to the user,
that in some cases were getting:

  "Invalid -1 error code"

Fix it to propagate the error.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-0tj89rs9g7nbcyd5skadlvuu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:30:01 -03:00
Arnaldo Carvalho de Melo
28f4417c33 perf annotate: Fix the signedness of failure returns
Callers of symbol__annotate() expect a errno value or some other
extended error value range in symbol__strerror_disassemble() to
convert to a proper error string, fix it when propagating a failure to
find the arch specific annotation routines via arch__find(arch_name).

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-o0k6dw7cas0vvmjjvgsyvu1i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:30:00 -03:00
Arnaldo Carvalho de Melo
a66fa0619a perf annotate: Propagate perf_env__arch() error
The callers of symbol__annotate2() use symbol__strerror_disassemble() to
convert its failure returns into a human readable string, so
propagate error values from functions it calls, starting with
perf_env__arch() that when fails the right thing to do is to look at
'errno' to see why its possible call to uname() failed.

Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/n/tip-it5d83kyusfhb1q1b0l4pxzs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-30 17:29:58 -03:00
Arnaldo Carvalho de Melo
252a2fdc74 perf tools: Replace needless mmap.h with what is needed, event.h
The perf_sample struct definition and the event_attr_init() are in
util/event.h, but some places were getting it thru an otherwise needless
util/mmap.h header, fix it by including util/event.h directly.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-p1anwyjdbbvghrkl9dlxv7h5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25 16:26:40 -03:00
Arnaldo Carvalho de Melo
e0fcfb086f perf evlist: Adopt backwards ring buffer state enum
As this isn't used at all in mmap.h but in evlist.h, so to cut down the
header dependency tree, move it to where it is used.

Also add mmap.h to the places using it but previously getting it
indirectly via evlist.h.

Add missing pthread.h to evlist.h, as it has a pthread_t struct member
and was getting the header via mmap.h.

Noticed while processing a Jiri's libperf batch touching mmap.h, where
almost everything gets rebuilt because evlist.h is so popular, so cut
down't this rebuild the world party.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/n/tip-he0uljeftl0xfveh3d6vtode@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25 09:51:45 -03:00
Arnaldo Carvalho de Melo
fb71c86cc8 perf tools: Remove util.h from where it is not needed
Check that it is not needed and remove, fixing up some fallout for
places where it was only serving to get something else.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-9h6dg6lsqe2usyqjh5rrues4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20 09:19:20 -03:00
Arnaldo Carvalho de Melo
f2a39fe849 perf auxtrace: Uninline functions that touch perf_session
So that we don't carry the session.h include directive in auxtrace.h,
which in turn opens a can of worms of files that were getting all sorts
of things via that include, fix them all.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-d2d83aovpgri2z75wlitquni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Arnaldo Carvalho de Melo
fa0d98462f perf tools: Remove needless evlist.h include directives
Remove the last unneeded use of cache.h in a header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.

This is an old file, used by now incorrectly in many places, so it was
providing includes needed indirectly, fixup this fallout.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-3x3l8gihoaeh7714os861ia7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Arnaldo Carvalho de Melo
fac583fdb6 perf dso: Adopt DSO related macros from symbol.h
Reducing the size of symbol.h by removing things that are better placed
somewhere else.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-edenkmjt1oe5fks2s6umd30b@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:19:28 -03:00
Arnaldo Carvalho de Melo
97b9d866a6 perf srcline: Add missing srcline.h header to files needing its defs
When srcline was introduced it wrongly added the include to util/sort.h,
even with that header not needing the definitions it provides, fix it by
adding it to the places that need it as a pre patch to remove srcline.h
from sort.h.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-shuebppedtye8hrgxk15qe3x@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26 11:58:29 -03:00
Arnaldo Carvalho de Melo
272172bd41 Merge remote-tracking branch 'torvalds/master' into perf/core
To get closer to upstream and check if we need to sync more UAPI
headers, pick up fixes for libbpf that prevent perf's container tests
from completing successfuly, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-12 16:25:00 -03:00
Arnaldo Carvalho de Melo
85127775a6 perf annotate: Fix printing of unaugmented disassembled instructions from BPF
The code to disassemble BPF programs uses binutil's disassembling
routines, and those use in turn fprintf to print to a memstream FILE,
adding a newline at the end of each line, which ends up confusing the
TUI routines called from:

  annotate_browser__write()
    annotate_line__write()
      annotate_browser__printf()
        ui_browser__vprintf()
          SLsmg_vprintf()

The SLsmg_vprintf() function in the slang library gets confused with the
terminating newline, so make the disasm_line__parse() function that
parses the lines produced by the BPF specific disassembler (that uses
binutil's libopcodes) and the lines produced by the objdump based
disassembler used for everything else (and that doesn't adds this
terminating newline) trim the end of the line in addition of the
beginning.

This way when disasm_line->ops.raw, i.e. for instructions without a
special scnprintf() method, we'll not have that \n getting in the way of
filling the screen right after the instruction with spaces to avoid
leaving what was on the screen before and thus garbling the annotation
screen, breaking scrolling, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 6987561c9e ("perf annotate: Enable annotation of BPF programs")
Link: https://lkml.kernel.org/n/tip-unbr5a5efakobfr6rhxq99ta@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-08 15:40:56 -03:00
Jiri Olsa
5643b1a59e libperf: Move nr_members from perf's evsel to libperf's perf_evsel
Move the nr_members member from perf's evsel to libperf's perf_evsel.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-60-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:46 -03:00
Jiri Olsa
6484d2f9dc libperf: Add nr_entries to struct perf_evlist
Move nr_entries count from 'struct perf' to into perf_evlist struct.

Committer notes:

Fix tools/perf/arch/s390/util/auxtrace.c case. And also the comment in
tools/perf/util/annotate.h.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-42-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:45 -03:00
Jiri Olsa
32dcd021d0 perf evsel: Rename struct perf_evsel to struct evsel
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:42 -03:00
Arnaldo Carvalho de Melo
e56fbc9dc7 perf tools: Use list_del_init() more thorougly
To allow for destructors to check if they're operating on a object still
in a list, and to avoid going from use after free list entries into
still valid, or even also other already removed from list entries.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 10:13:27 -03:00
Arnaldo Carvalho de Melo
d8f9da2404 perf tools: Use zfree() where applicable
In places where the equivalent was already being done, i.e.:

   free(a);
   a = NULL;

And in placs where struct members are being freed so that if we have
some erroneous reference to its struct, then accesses to freed members
will result in segfaults, which we can detect faster than use after free
to areas that may still have something seemingly valid.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 10:13:27 -03:00
Leo Yan
600c787dbf perf annotate: Fix dereferencing freed memory found by the smatch tool
Based on the following report from Smatch, fix the potential
dereferencing freed memory check.

  tools/perf/util/annotate.c:1125
  disasm_line__parse() error: dereferencing freed memory 'namep'

  tools/perf/util/annotate.c
  1100 static int disasm_line__parse(char *line, const char **namep, char **rawp)
  1101 {
  1102         char tmp, *name = ltrim(line);

  [...]

  1114         *namep = strdup(name);
  1115
  1116         if (*namep == NULL)
  1117                 goto out_free_name;

  [...]

  1124 out_free_name:
  1125         free((void *)namep);
                            ^^^^^
  1126         *namep = NULL;
               ^^^^^^
  1127         return -1;
  1128 }

If strdup() fails to allocate memory space for *namep, we don't need to
free memory with pointer 'namep', which is resident in data structure
disasm_line::ins::name; and *namep is NULL pointer for this failure, so
it's pointless to assign NULL to *namep again.

Committer note:

Freeing namep, which is the address of the first entry of the 'struct
ins' that is the first member of struct disasm_line would in fact free
that disasm_line instance, if it was allocated via malloc/calloc, which,
later, would a dereference of freed memory.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Alexios Zavras <alexios.zavras@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Changbin Du <changbin.du@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190702103420.27540-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 09:33:55 -03:00
Mao Han
aa23aa5516 perf annotate: Add csky support
This patch add basic arch initialization and instruction associate
support for the csky CPU architecture.

E.g.:

  $ perf annotate --stdio2
  Samples: 161  of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.):
  40250000, [percent: local period]
  test_4() /usr/lib/perf-test/callchain_test
  Percent

              Disassembly of section .text:

              00008420 <test_4>:
            test_4():
                subi  sp, sp, 4
                st.w  r8, (sp, 0x0)
                mov   r8, sp
                subi  sp, sp, 8
                subi  r3, r8, 4
                movi  r2, 0
                st.w  r2, (r3, 0x0)
              ↓ br    2e
  100.00  14:   subi  r3, r8, 4
                ld.w  r2, (r3, 0x0)
                subi  r3, r8, 8
                st.w  r2, (r3, 0x0)
                subi  r3, r8, 4
                ld.w  r3, (r3, 0x0)
                addi  r2, r3, 1
                subi  r3, r8, 4
                st.w  r2, (r3, 0x0)
          2e:   subi  r3, r8, 4
                ld.w  r2, (r3, 0x0)
                lrw   r3, 0x98967f    // 8598 <main+0x28>
                cmplt r3, r2
              ↑ bf    14
                mov   r0, r0
                mov   r0, r0
                mov   sp, r8
                ld.w  r8, (sp, 0x0)
                addi  sp, sp, 4
              ← rts

Signed-off-by: Mao Han <han_mao@c-sky.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-csky@vger.kernel.org
Link: http://lkml.kernel.org/r/d874d7782d9acdad5d98f2f5c4a6fb26fbe41c5d.1561531557.git.han_mao@c-sky.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-01 22:50:41 -03:00
Arnaldo Carvalho de Melo
13c230ab6e perf tools: Ditch rtrim(), use strim() from tools/lib
Cleaning up a bit more tools/perf/util/ by using things we got from the
kernel and have in tools/lib/

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7hluuoveryoicvkclshzjf1k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-01 22:50:33 -03:00
Arnaldo Carvalho de Melo
328584804e perf tools: Ditch rtrim(), use skip_spaces() to get closer to the kernel
No change in behaviour, just using the same kernel idiom for such
operation.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: André Goddard Rosa <andre.goddard@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-a85lkptkt0ru40irpga8yf54@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-26 11:42:03 -03:00
Arnaldo Carvalho de Melo
3052ba56bc tools perf: Move from sane_ctype.h obtained from git to the Linux's original
We got the sane_ctype.h headers from git and kept using it so far, but
since that code originally came from the kernel sources to the git
sources, perhaps its better to just use the one in the kernel, so that
we can leverage tools/perf/check_headers.sh to be notified when our copy
gets out of sync, i.e. when fixes or goodies are added to the code we've
copied.

This will help with things like tools/lib/string.c where we want to have
more things in common with the kernel, such as strim(), skip_spaces(),
etc so as to go on removing the things that we have in tools/perf/util/
and instead using the code in the kernel, indirectly and removing things
like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements
are made to the original code.

Hopefully this also should help with reducing the difference of code
hosted in tools/ to the one in the kernel proper.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25 21:02:47 -03:00
Ingo Molnar
3ce5aceb5d perf/core improvements and fixes:
perf record:
 
   Alexey Budankov:
 
   - Allow mixing --user-regs with --call-graph=dwarf, making sure that
     the minimal set of registers for DWARF unwinding is present in the
     set of user registers requested to be present in each sample, while
     warning the user that this may make callchains unreliable if more
     that the minimal set of registers is needed to unwind.
 
   yuzhoujian:
 
   - Add support to collect callchains from kernel or user space only,
     IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user}
     bits from the command line.
 
 perf trace:
 
   Arnaldo Carvalho de Melo:
 
   - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls
     BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit}
     payloads, use instead the syscall numbers obtainer either by the
     arch specific syscalltbl generators or from audit-libs.
 
   - Allow 'perf trace' to ask for the number of bytes to collect for
     string arguments, for now ask for PATH_MAX, i.e. the whole
     pathnames, which ends up being just a way to speficy which syscall
     args are pathnames and thus should be read using bpf_probe_read_str().
 
   - Skip unknown syscalls when expanding strace like syscall groups.
     This helps using the 'string' group of syscalls to work in arm64,
     where some of the syscalls present in x86_64 that deal with
     strings, for instance 'access', are deprecated and this should not
     be asked for tracing.
 
   Leo Yan:
 
   - Exit when failing to build eBPF program.
 
 perf config:
 
   Arnaldo Carvalho de Melo:
 
   - Bail out when a handler returns failure for a key-value pair. This
     helps with cases where processing a key-value pair is not just a
     matter of setting some tool specific knob, involving, for instance
     building a BPF program to then attach to the list of events 'perf
     trace' will use, e.g. augmented_raw_syscalls.c.
 
 perf.data:
 
   Kan Liang:
 
   - Read and store die ID information available in new Intel processors
     in CPUID.1F in the CPU topology written in the perf.data header.
 
 perf stat:
 
   Kan Liang:
 
   - Support per-die aggregation.
 
 Documentation:
 
   Arnaldo Carvalho de Melo:
 
   - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY,
     CLOCKID and DIR_FORMAT headers.
 
   Song Liu:
 
   - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF.
 
   Leo Yan:
 
   - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'.
 
 JVMTI:
 
   Jiri Olsa:
 
   - Address gcc string overflow warning for strncpy()
 
 core:
 
   - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd().
 
 Intel PT:
 
   Adrian Hunter:
 
   - Add support for samples to contain IPC ratio, collecting cycles
     information from CYC packets, showing the IPC info periodically, because
     Intel PT does not update the cycle count on every branch or instruction,
     the incremental values will often be zero.  When there are values, they
     will be the number of instructions and number of cycles since the last
     update, and thus represent the average IPC since the last IPC value.
 
     E.g.:
 
     # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001
     rounding mmap pages size to 1024M (262144 pages)
     [ perf record: Woken up 0 times to write data ]
     [ perf record: Captured and wrote 2.208 MB perf.data ]
     # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid
     #
     <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)>
     1   cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f   jnz 0x7f5219ac2af0       IPC: 0.81 (36/44)
     2   cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45   cmp $0x1f, %rbp
     3   cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49   jbe 0x7f5219ac2b00
     4   cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f   test $0x8, %al
     5   cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51   jnz 0x7f5219ac2b00
     6   cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57   movq  0x13c58a(%rip), %rcx
     7   cc1 63501.650479626: 7f5219ac27de _int_free+0x5e   mov %rdi, %r12
     8   cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61   movq  %fs:(%rcx), %rax
     9   cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65   test %rax, %rax
    10   cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68   jz 0x7f5219ac2821
    11   cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a   leaq  -0x11(%rbp), %rdi
    12   cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e   mov %rdi, %rsi
    13   cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71   shr $0x4, %rsi
    14   cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75   cmpq  %rsi, 0x13caf4(%rip)
    15   cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c   jbe 0x7f5219ac2821
    16   cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1   cmpq  0x13f138(%rip), %rbp
    17   cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8   jnbe 0x7f5219ac28d8
    18   cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158  testb  $0x2, 0x8(%rbx)
    19   cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c  jnz 0x7f5219ac2ab0       IPC: 6.00 (18/3)
     <SNIP>
 
   - Allow using time ranges with Intel PT, i.e. these features, already
     present but not optimially usable with Intel PT, should be now:
 
         Select the second 10% time slice:
 
         $ perf script --time 10%/2
 
         Select from 0% to 10% time slice:
 
         $ perf script --time 0%-10%
 
         Select the first and second 10% time slices:
 
         $ perf script --time 10%/1,10%/2
 
         Select from 0% to 10% and 30% to 40% slices:
 
         $ perf script --time 0%-10%,30%-40%
 
 cs-etm (ARM):
 
   Mathieu Poirier:
 
   - Add support for CPU-wide trace scenarios.
 
 s390:
 
   Thomas Richter:
 
   - Fix missing kvm module load for s390.
 
   - Fix OOM error in TUI mode on s390
 
   - Support s390 diag event display when doing analysis on !s390
     architectures.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXP/1xQAKCRCyPKLppCJ+
 J9xcAQCwOITAshE7op7HbKUPtkqiMNu+hpNa3skhxEpGHvKO0AEArpBXtuvEP8EU
 PZsp+8vcVrlZ+dZutttgvkRz25mScg8=
 =kfFb
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf record:

  Alexey Budankov:

  - Allow mixing --user-regs with --call-graph=dwarf, making sure that
    the minimal set of registers for DWARF unwinding is present in the
    set of user registers requested to be present in each sample, while
    warning the user that this may make callchains unreliable if more
    that the minimal set of registers is needed to unwind.

  yuzhoujian:

  - Add support to collect callchains from kernel or user space only,
    IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user}
    bits from the command line.

perf trace:

  Arnaldo Carvalho de Melo:

  - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls
    BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit}
    payloads, use instead the syscall numbers obtainer either by the
    arch specific syscalltbl generators or from audit-libs.

  - Allow 'perf trace' to ask for the number of bytes to collect for
    string arguments, for now ask for PATH_MAX, i.e. the whole
    pathnames, which ends up being just a way to speficy which syscall
    args are pathnames and thus should be read using bpf_probe_read_str().

  - Skip unknown syscalls when expanding strace like syscall groups.
    This helps using the 'string' group of syscalls to work in arm64,
    where some of the syscalls present in x86_64 that deal with
    strings, for instance 'access', are deprecated and this should not
    be asked for tracing.

  Leo Yan:

  - Exit when failing to build eBPF program.

perf config:

  Arnaldo Carvalho de Melo:

  - Bail out when a handler returns failure for a key-value pair. This
    helps with cases where processing a key-value pair is not just a
    matter of setting some tool specific knob, involving, for instance
    building a BPF program to then attach to the list of events 'perf
    trace' will use, e.g. augmented_raw_syscalls.c.

perf.data:

  Kan Liang:

  - Read and store die ID information available in new Intel processors
    in CPUID.1F in the CPU topology written in the perf.data header.

perf stat:

  Kan Liang:

  - Support per-die aggregation.

Documentation:

  Arnaldo Carvalho de Melo:

  - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY,
    CLOCKID and DIR_FORMAT headers.

  Song Liu:

  - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF.

  Leo Yan:

  - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'.

JVMTI:

  Jiri Olsa:

  - Address gcc string overflow warning for strncpy()

core:

  - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd().

Intel PT:

  Adrian Hunter:

  - Add support for samples to contain IPC ratio, collecting cycles
    information from CYC packets, showing the IPC info periodically, because
    Intel PT does not update the cycle count on every branch or instruction,
    the incremental values will often be zero.  When there are values, they
    will be the number of instructions and number of cycles since the last
    update, and thus represent the average IPC since the last IPC value.

    E.g.:

    # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001
    rounding mmap pages size to 1024M (262144 pages)
    [ perf record: Woken up 0 times to write data ]
    [ perf record: Captured and wrote 2.208 MB perf.data ]
    # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid
    #
    <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)>
    1   cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f   jnz 0x7f5219ac2af0       IPC: 0.81 (36/44)
    2   cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45   cmp $0x1f, %rbp
    3   cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49   jbe 0x7f5219ac2b00
    4   cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f   test $0x8, %al
    5   cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51   jnz 0x7f5219ac2b00
    6   cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57   movq  0x13c58a(%rip), %rcx
    7   cc1 63501.650479626: 7f5219ac27de _int_free+0x5e   mov %rdi, %r12
    8   cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61   movq  %fs:(%rcx), %rax
    9   cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65   test %rax, %rax
   10   cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68   jz 0x7f5219ac2821
   11   cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a   leaq  -0x11(%rbp), %rdi
   12   cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e   mov %rdi, %rsi
   13   cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71   shr $0x4, %rsi
   14   cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75   cmpq  %rsi, 0x13caf4(%rip)
   15   cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c   jbe 0x7f5219ac2821
   16   cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1   cmpq  0x13f138(%rip), %rbp
   17   cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8   jnbe 0x7f5219ac28d8
   18   cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158  testb  $0x2, 0x8(%rbx)
   19   cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c  jnz 0x7f5219ac2ab0       IPC: 6.00 (18/3)
    <SNIP>

  - Allow using time ranges with Intel PT, i.e. these features, already
    present but not optimially usable with Intel PT, should be now:

        Select the second 10% time slice:

        $ perf script --time 10%/2

        Select from 0% to 10% time slice:

        $ perf script --time 0%-10%

        Select the first and second 10% time slices:

        $ perf script --time 10%/1,10%/2

        Select from 0% to 10% and 30% to 40% slices:

        $ perf script --time 0%-10%,30%-40%

cs-etm (ARM):

  Mathieu Poirier:

  - Add support for CPU-wide trace scenarios.

s390:

  Thomas Richter:

  - Fix missing kvm module load for s390.

  - Fix OOM error in TUI mode on s390

  - Support s390 diag event display when doing analysis on !s390
    architectures.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17 20:48:14 +02:00
Thomas Richter
8a07aa4e9b perf report: Fix OOM error in TUI mode on s390
Debugging a OOM error using the TUI interface revealed this issue
on s390:

[tmricht@m83lp54 perf]$ cat /proc/kallsyms |sort
....
00000001119b7158 B radix_tree_node_cachep
00000001119b8000 B __bss_stop
00000001119b8000 B _end
000003ff80002850 t autofs_mount	[autofs4]
000003ff80002868 t autofs_show_options	[autofs4]
000003ff80002a98 t autofs_evict_inode	[autofs4]
....

There is a huge gap between the last kernel symbol
__bss_stop/_end and the first kernel module symbol
autofs_mount (from autofs4 module).

After reading the kernel symbol table via functions:

 dso__load()
 +--> dso__load_kernel_sym()
      +--> dso__load_kallsyms()
	   +--> __dso_load_kallsyms()
	        +--> symbols__fixup_end()

the symbol __bss_stop has a start address of 1119b8000 and
an end address of 3ff80002850, as can be seen by this debug statement:

  symbols__fixup_end __bss_stop start:0x1119b8000 end:0x3ff80002850

The size of symbol __bss_stop is 0x3fe6e64a850 bytes!
It is the last kernel symbol and fills up the space until
the first kernel module symbol.

This size kills the TUI interface when executing the following
code:

  process_sample_event()
    hist_entry_iter__add()
      hist_iter__report_callback()
        hist_entry__inc_addr_samples()
          symbol__inc_addr_samples(symbol = __bss_stop)
            symbol__cycles_hist()
               annotated_source__alloc_histograms(...,
				                symbol__size(sym),
		                                ...)

This function allocates memory to save sample histograms.
The symbol_size() marco is defined as sym->end - sym->start, which
results in above value of 0x3fe6e64a850 bytes and
the call to calloc() in annotated_source__alloc_histograms() fails.

The histgram memory allocation might fail, make this failure
no-fatal and continue processing.

Output before:
[tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
					      -i ~/slow.data 2>/tmp/2
[tmricht@m83lp54 perf]$ tail -5 /tmp/2
  __symbol__inc_addr_samples(875): ENOMEM! sym->name=__bss_stop,
		start=0x1119b8000, addr=0x2aa0005eb08, end=0x3ff80002850,
		func: 0
problem adding hist entry, skipping event
0x938b8 [0x8]: failed to process type: 68 [Cannot allocate memory]
[tmricht@m83lp54 perf]$

Output after:
[tmricht@m83lp54 perf]$ ./perf --debug stderr=1 report -vvvvv \
					      -i ~/slow.data 2>/tmp/2
[tmricht@m83lp54 perf]$ tail -5 /tmp/2
   symbol__inc_addr_samples map:0x1597830 start:0x110730000 end:0x3ff80002850
   symbol__hists notes->src:0x2aa2a70 nr_hists:1
   symbol__inc_addr_samples sym:unlink_anon_vmas src:0x2aa2a70
   __symbol__inc_addr_samples: addr=0x11094c69e
   0x11094c670 unlink_anon_vmas: period++ [addr: 0x11094c69e, 0x2e, evidx=0]
   	=> nr_samples: 1, period: 526008
[tmricht@m83lp54 perf]$

There is no error about failed memory allocation and the TUI interface
shows all entries.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/90cb5607-3e12-5167-682d-978eba7dafa8@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10 16:20:13 -03:00
Thomas Gleixner
910070454e treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 251
Based on 1 normalized pattern(s):

  released under the gpl v2 and only v2 not any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 12 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141332.526460839@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:30:26 +02:00
Jin Yao
bdd1666b3d perf annotate: Remove hist__account_cycles() from callback
The hist__account_cycles() function is executed when the
hist_iter__branch_callback() is called.

But it looks it's not necessary.  In hist__account_cycles, it already
walks on all branch entries.

This patch moves the hist__account_cycles out of callback, now the data
processing is much faster than before.

Previous code has an issue that the ch[offset].num++ (in
__symbol__account_cycles) is executed repeatedly since
hist__account_cycles is called in each hist_iter__branch_callback, so
the counting of ch[offset].num is not correct (too big).

With this patch, the issue is fixed. And we don't need the code of
"ch->reset >= ch->num / 2" to check if there are too many overlaps (in
annotation__count_and_fill), otherwise some data would be hidden.

Now, we can try, for example:

  perf record -b ...
  perf annotate or perf report -s symbol

The before/after output should be no change.

 v3:
 ---
 Fix the crash in stdio mode.
 Like previous code, it needs the checking of ui__has_annotation()
 before hist__account_cycles()

 v2:
 ---
 1. Cover the similar perf report
 2. Remove the checking code "ch->reset >= ch->num / 2"

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1552684577-29041-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-15 16:36:46 -03:00
Thadeu Lima de Souza Cascardo
01e985e900 perf annotate: Fix build on 32 bit for BPF annotation
Commit 6987561c9e ("perf annotate: Enable annotation of BPF programs") adds
support for BPF programs annotations but the new code does not build on 32-bit.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Song Liu <songliubraving@fb.com>
Fixes: 6987561c9e ("perf annotate: Enable annotation of BPF programs")
Link: http://lkml.kernel.org/r/20190403194452.10845-1-cascardo@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-02 16:00:19 -04:00
Song Liu
6987561c9e perf annotate: Enable annotation of BPF programs
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.

symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.

Committer testing:

After fixing this:

  -               u64 *addrs = (u64 *)(info_linear->info.jited_ksyms);
  +               u64 *addrs = (u64 *)(uintptr_t)(info_linear->info.jited_ksyms);

Detected when crossbuilding to a 32-bit arch.

And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:

1) Have a BPF program running, one that has BTF info, etc, I used
   the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
   by 'perf trace'.

  # grep -B1 augmented_raw ~/.perfconfig
  [trace]
	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
  #
  # perf trace -e *mmsg
  dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
  NetworkManager/10055 sendmmsg(22<socket:[1056822]>, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2

2) Then do a 'perf record' system wide for a while:

  # perf record -a
  ^C[ perf record: Woken up 68 times to write data ]
  [ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
  #

3) Check that we captured BPF and BTF info in the perf.data file:

  # perf report --header-only | grep 'b[pt]f'
  # event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
  # bpf_prog_info of id 13
  # bpf_prog_info of id 14
  # bpf_prog_info of id 15
  # bpf_prog_info of id 16
  # bpf_prog_info of id 17
  # bpf_prog_info of id 18
  # bpf_prog_info of id 21
  # bpf_prog_info of id 22
  # bpf_prog_info of id 41
  # bpf_prog_info of id 42
  # btf info of id 2
  #

4) Check which programs got recorded:

   # perf report | grep bpf_prog | head
     0.16%  exe              bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.14%  exe              bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.08%  fuse-overlayfs   bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.07%  fuse-overlayfs   bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.00%  runc             bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  sh               bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

  This was with the default --sort order for 'perf report', which is:

    --sort comm,dso,symbol

  If we just look for the symbol, for instance:

   # perf report --sort symbol | grep bpf_prog | head
     0.26%  [k] bpf_prog_819967866022f1e1_sys_enter                -      -
     0.24%  [k] bpf_prog_c1bd85c092d6e4aa_sys_exit                 -      -
   #

  or the DSO:

   # perf report --sort dso | grep bpf_prog | head
     0.26%  bpf_prog_819967866022f1e1_sys_enter
     0.24%  bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place,  one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.

Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:

  # perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter

  Samples: 950  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Percent      int sys_enter(struct syscall_enter_args *args)
   53.41         push   %rbp

    0.63         mov    %rsp,%rbp
    0.31         sub    $0x170,%rsp
    1.93         sub    $0x28,%rbp
    7.02         mov    %rbx,0x0(%rbp)
    3.20         mov    %r13,0x8(%rbp)
    1.07         mov    %r14,0x10(%rbp)
    0.61         mov    %r15,0x18(%rbp)
    0.11         xor    %eax,%eax
    1.29         mov    %rax,0x20(%rbp)
    0.11         mov    %rdi,%rbx
               	return bpf_get_current_pid_tgid();
    2.02       → callq  *ffffffffda6776d9
    2.76         mov    %eax,-0x148(%rbp)
                 mov    %rbp,%rsi
               int sys_enter(struct syscall_enter_args *args)
                 add    $0xfffffffffffffeb8,%rsi
               	return bpf_map_lookup_elem(pids, &pid) != NULL;
                 movabs $0xffff975ac2607800,%rdi

    1.26       → callq  *ffffffffda6789e9
                 cmp    $0x0,%rax
    2.43       → je     0
                 add    $0x38,%rax
    0.21         xor    %r13d,%r13d
               	if (pid_filter__has(&pids_filtered, getpid()))
    0.81         cmp    $0x0,%rax
               → jne    0
                 mov    %rbp,%rdi
               	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
    2.22         add    $0xfffffffffffffeb8,%rdi
    0.11         mov    $0x40,%esi
    0.32         mov    %rbx,%rdx
    2.74       → callq  *ffffffffda658409
               	syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
    0.22         mov    %rbp,%rsi
    1.69         add    $0xfffffffffffffec0,%rsi
               	syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
                 movabs $0xffff975bfcd36000,%rdi

                 add    $0xd0,%rdi
    0.21         mov    0x0(%rsi),%eax
    0.93         cmp    $0x200,%rax
               → jae    0
    0.10         shl    $0x3,%rax

    0.11         add    %rdi,%rax
    0.11       → jmp    0
                 xor    %eax,%eax
               	if (syscall == NULL || !syscall->enabled)
    1.07         cmp    $0x0,%rax
               → je     0
               	if (syscall == NULL || !syscall->enabled)
    6.57         movzbq 0x0(%rax),%rdi

               	if (syscall == NULL || !syscall->enabled)
                 cmp    $0x0,%rdi
    0.95       → je     0
                 mov    $0x40,%r8d
               	switch (augmented_args.args.syscall_nr) {
                 mov    -0x140(%rbp),%rdi
               	switch (augmented_args.args.syscall_nr) {
                 cmp    $0x2,%rdi
               → je     0
                 cmp    $0x101,%rdi
               → je     0
                 cmp    $0x15,%rdi
               → jne    0
               	case SYS_OPEN:	 filename_arg = (const void *)args->args[0];
                 mov    0x10(%rbx),%rdx
               → jmp    0
               	case SYS_OPENAT: filename_arg = (const void *)args->args[1];
                 mov    0x18(%rbx),%rdx
               	if (filename_arg != NULL) {
                 cmp    $0x0,%rdx
               → je     0
                 xor    %edi,%edi
               		augmented_args.filename.reserved = 0;
                 mov    %edi,-0x104(%rbp)
               		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                 mov    %rbp,%rdi
                 add    $0xffffffffffffff00,%rdi
               		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                 mov    $0x100,%esi
               → callq  *ffffffffda658499
                 mov    $0x148,%r8d
               		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                 mov    %eax,-0x108(%rbp)
               		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                 mov    %rax,%rdi
                 shl    $0x20,%rdi

                 shr    $0x20,%rdi

               		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
                 cmp    $0xff,%rdi
               → ja     0
               			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
                 add    $0x48,%rax
               			len &= sizeof(augmented_args.filename.value) - 1;
                 and    $0xff,%rax
                 mov    %rax,%r8
                 mov    %rbp,%rcx
               	return perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
                 add    $0xfffffffffffffeb8,%rcx
                 mov    %rbx,%rdi
                 movabs $0xffff975fbd72d800,%rsi

                 mov    $0xffffffff,%edx
               → callq  *ffffffffda658ad9
                 mov    %rax,%r13
               }
                 mov    %r13,%rax
    0.72         mov    0x0(%rbp),%rbx
                 mov    0x8(%rbp),%r13
    1.16         mov    0x10(%rbp),%r14
    0.10         mov    0x18(%rbp),%r15
    0.42         add    $0x28,%rbp
    0.54         leaveq
    0.54       ← retq
  #

Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.

Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:

  # perf annotate bpf_prog_819967866022f1e1_sys_enter

Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:

  # cat bpf_prog_819967866022f1e1_sys_enter.annotation
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Event: cycles:ppp

   53.41    0:   push   %rbp

    0.63    1:   mov    %rsp,%rbp
    0.31    4:   sub    $0x170,%rsp
    1.93    b:   sub    $0x28,%rbp
    7.02    f:   mov    %rbx,0x0(%rbp)
    3.20   13:   mov    %r13,0x8(%rbp)
    1.07   17:   mov    %r14,0x10(%rbp)
    0.61   1b:   mov    %r15,0x18(%rbp)
    0.11   1f:   xor    %eax,%eax
    1.29   21:   mov    %rax,0x20(%rbp)
    0.11   25:   mov    %rdi,%rbx
    2.02   28: → callq  *ffffffffda6776d9
    2.76   2d:   mov    %eax,-0x148(%rbp)
           33:   mov    %rbp,%rsi
           36:   add    $0xfffffffffffffeb8,%rsi
           3d:   movabs $0xffff975ac2607800,%rdi

    1.26   47: → callq  *ffffffffda6789e9
           4c:   cmp    $0x0,%rax
    2.43   50: → je     0
           52:   add    $0x38,%rax
    0.21   56:   xor    %r13d,%r13d
    0.81   59:   cmp    $0x0,%rax
           5d: → jne    0
           63:   mov    %rbp,%rdi
    2.22   66:   add    $0xfffffffffffffeb8,%rdi
    0.11   6d:   mov    $0x40,%esi
    0.32   72:   mov    %rbx,%rdx
    2.74   75: → callq  *ffffffffda658409
    0.22   7a:   mov    %rbp,%rsi
    1.69   7d:   add    $0xfffffffffffffec0,%rsi
           84:   movabs $0xffff975bfcd36000,%rdi

           8e:   add    $0xd0,%rdi
    0.21   95:   mov    0x0(%rsi),%eax
    0.93   98:   cmp    $0x200,%rax
           9f: → jae    0
    0.10   a1:   shl    $0x3,%rax

    0.11   a5:   add    %rdi,%rax
    0.11   a8: → jmp    0
           aa:   xor    %eax,%eax
    1.07   ac:   cmp    $0x0,%rax
           b0: → je     0
    6.57   b6:   movzbq 0x0(%rax),%rdi

           bb:   cmp    $0x0,%rdi
    0.95   bf: → je     0
           c5:   mov    $0x40,%r8d
           cb:   mov    -0x140(%rbp),%rdi
           d2:   cmp    $0x2,%rdi
           d6: → je     0
           d8:   cmp    $0x101,%rdi
           df: → je     0
           e1:   cmp    $0x15,%rdi
           e5: → jne    0
           e7:   mov    0x10(%rbx),%rdx
           eb: → jmp    0
           ed:   mov    0x18(%rbx),%rdx
           f1:   cmp    $0x0,%rdx
           f5: → je     0
           f7:   xor    %edi,%edi
           f9:   mov    %edi,-0x104(%rbp)
           ff:   mov    %rbp,%rdi
          102:   add    $0xffffffffffffff00,%rdi
          109:   mov    $0x100,%esi
          10e: → callq  *ffffffffda658499
          113:   mov    $0x148,%r8d
          119:   mov    %eax,-0x108(%rbp)
          11f:   mov    %rax,%rdi
          122:   shl    $0x20,%rdi

          126:   shr    $0x20,%rdi

          12a:   cmp    $0xff,%rdi
          131: → ja     0
          133:   add    $0x48,%rax
          137:   and    $0xff,%rax
          13d:   mov    %rax,%r8
          140:   mov    %rbp,%rcx
          143:   add    $0xfffffffffffffeb8,%rcx
          14a:   mov    %rbx,%rdi
          14d:   movabs $0xffff975fbd72d800,%rsi

          157:   mov    $0xffffffff,%edx
          15c: → callq  *ffffffffda658ad9
          161:   mov    %rax,%r13
          164:   mov    %r13,%rax
    0.72  167:   mov    0x0(%rbp),%rbx
          16b:   mov    0x8(%rbp),%r13
    1.16  16f:   mov    0x10(%rbp),%r14
    0.10  173:   mov    0x18(%rbp),%r15
    0.42  177:   add    $0x28,%rbp
    0.54  17b:   leaveq
    0.54  17c: ← retq

Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)

Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-20 16:43:15 -03:00
Arnaldo Carvalho de Melo
bc3bb79534 perf annotate: Calculate the max instruction name, align column to that
We were hardcoding '6' as the max instruction name, and we have lots
that are longer than that, see the diff from two 'P' printed TUI
annotations for a libc function that uses instructions with long names,
such as 'vpmovmskb' with its 9 chars:

  --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
  +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
  @@ -2,284 +2,284 @@
   Event: cycles:ppp

   Percent        endbr64
  -  0.10         mov    %edi,%eax
  +  0.10         mov        %edi,%eax
  -               xor    %edx,%edx
  +               xor        %edx,%edx
  -  3.54         vpxor  %ymm7,%ymm7,%ymm7
  +  3.54         vpxor      %ymm7,%ymm7,%ymm7
  -               or     %esi,%eax
  +               or         %esi,%eax
  -               and    $0xfff,%eax
  +               and        $0xfff,%eax
  -               cmp    $0xf80,%eax
  +               cmp        $0xf80,%eax
  -             ↓ jg     370
  +             ↓ jg         370
  - 27.07         vmovdqu (%rdi),%ymm1
  + 27.07         vmovdqu    (%rdi),%ymm1
  -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
  +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
  -  2.15         vpminub %ymm1,%ymm0,%ymm0
  +  2.15         vpminub    %ymm1,%ymm0,%ymm0
  -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
  +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
  -  0.43         vpmovmskb %ymm0,%ecx
  +  0.43         vpmovmskb  %ymm0,%ecx
  -  1.53         test   %ecx,%ecx
  +  1.53         test       %ecx,%ecx
  -             ↓ je     b0
  +             ↓ je         b0
  -  5.26         tzcnt  %ecx,%edx
  +  5.26         tzcnt      %ecx,%edx
  - 18.40         movzbl (%rdi,%rdx,1),%eax
  + 18.40         movzbl     (%rdi,%rdx,1),%eax
  -  7.09         movzbl (%rsi,%rdx,1),%edx
  +  7.09         movzbl     (%rsi,%rdx,1),%edx
  -  3.34         sub    %edx,%eax
  +  3.34         sub        %edx,%eax
     2.37         vzeroupper
                ← retq
                  nop
  -         50:   tzcnt  %ecx,%edx
  +         50:   tzcnt      %ecx,%edx
  -               movzbl 0x20(%rdi,%rdx,1),%eax
  +               movzbl     0x20(%rdi,%rdx,1),%eax
  -               movzbl 0x20(%rsi,%rdx,1),%edx
  +               movzbl     0x20(%rsi,%rdx,1),%edx
  -               sub    %edx,%eax
  +               sub        %edx,%eax
                  vzeroupper
                ← retq
  -               data16 nopw %cs:0x0(%rax,%rax,1)
  +               data16     nopw %cs:0x0(%rax,%rax,1)

Reported-by: Travis Downs <travis.downs@gmail.com>
LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-06 16:40:15 -03:00
Wei Li
11db1ad451 perf annotate: Fix getting source line failure
The output of "perf annotate -l --stdio xxx" changed since commit 425859ff0d
("perf annotate: No need to calculate notes->start twice") removed notes->start
assignment in symbol__calc_lines(). It will get failed in
find_address_in_section() from symbol__tty_annotate() subroutine as the
a2l->addr is wrong. So the annotate summary doesn't report the line number of
source code correctly.

Before fix:

  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ cat common_while_1.c
  void hotspot_1(void)
  {
	volatile int i;

	for (i = 0; i < 0x10000000; i++);
	for (i = 0; i < 0x10000000; i++);
	for (i = 0; i < 0x10000000; i++);
  }

  int main(void)
  {
	hotspot_1();

	return 0;
  }
  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ gcc common_while_1.c -g -o common_while_1

  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.488 MB perf.data (12498 samples) ]
  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

  Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
  ----------------------------------------------

   19.30 common_while_1[32]
   19.03 common_while_1[4e]
   19.01 common_while_1[16]
    5.04 common_while_1[13]
    4.99 common_while_1[4b]
    4.78 common_while_1[2c]
    4.77 common_while_1[10]
    4.66 common_while_1[2f]
    4.59 common_while_1[51]
    4.59 common_while_1[35]
    4.52 common_while_1[19]
    4.20 common_while_1[56]
    0.51 common_while_1[48]
   Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12480 samples, percent: local period)
  -----------------------------------------------------------------------------------------------------------------
         :
         :
         :
         :         Disassembly of section .text:
         :
         :         00000000000005fa <hotspot_1>:
         :         hotspot_1():
         :         void hotspot_1(void)
         :         {
    0.00 :   5fa:   push   %rbp
    0.00 :   5fb:   mov    %rsp,%rbp
         :                 volatile int i;
         :
         :                 for (i = 0; i < 0x10000000; i++);
    0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
    0.00 :   605:   jmp    610 <hotspot_1+0x16>
    0.00 :   607:   mov    -0x4(%rbp),%eax
   common_while_1[10]    4.77 :   60a:   add    $0x1,%eax
   common_while_1[13]    5.04 :   60d:   mov    %eax,-0x4(%rbp)
   common_while_1[16]   19.01 :   610:   mov    -0x4(%rbp),%eax
   common_while_1[19]    4.52 :   613:   cmp    $0xfffffff,%eax
      0.00 :   618:   jle    607 <hotspot_1+0xd>
           :                 for (i = 0; i < 0x10000000; i++);
  ...

After fix:

  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf record ./common_while_1
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.488 MB perf.data (12500 samples) ]
  liwei@euler:~/main_code/hulk_work/hulk/tools/perf$ sudo ./perf annotate -l -s hotspot_1 --stdio

  Sorted summary for file /home/liwei/main_code/hulk_work/hulk/tools/perf/common_while_1
  ----------------------------------------------

   33.34 common_while_1.c:5
   33.34 common_while_1.c:6
   33.32 common_while_1.c:7
   Percent |      Source code & Disassembly of common_while_1 for cycles:ppp (12482 samples, percent: local period)
  -----------------------------------------------------------------------------------------------------------------
         :
         :
         :
         :         Disassembly of section .text:
         :
         :         00000000000005fa <hotspot_1>:
         :         hotspot_1():
         :         void hotspot_1(void)
         :         {
    0.00 :   5fa:   push   %rbp
    0.00 :   5fb:   mov    %rsp,%rbp
         :                 volatile int i;
         :
         :                 for (i = 0; i < 0x10000000; i++);
    0.00 :   5fe:   movl   $0x0,-0x4(%rbp)
    0.00 :   605:   jmp    610 <hotspot_1+0x16>
    0.00 :   607:   mov    -0x4(%rbp),%eax
   common_while_1.c:5    4.70 :   60a:   add    $0x1,%eax
    4.89 :   60d:   mov    %eax,-0x4(%rbp)
   common_while_1.c:5   19.03 :   610:   mov    -0x4(%rbp),%eax
   common_while_1.c:5    4.72 :   613:   cmp    $0xfffffff,%eax
    0.00 :   618:   jle    607 <hotspot_1+0xd>
         :                 for (i = 0; i < 0x10000000; i++);
    0.00 :   61a:   movl   $0x0,-0x4(%rbp)
    0.00 :   621:   jmp    62c <hotspot_1+0x32>
    0.00 :   623:   mov    -0x4(%rbp),%eax
   common_while_1.c:6    4.54 :   626:   add    $0x1,%eax
    4.73 :   629:   mov    %eax,-0x4(%rbp)
   common_while_1.c:6   19.54 :   62c:   mov    -0x4(%rbp),%eax
   common_while_1.c:6    4.54 :   62f:   cmp    $0xfffffff,%eax
  ...

Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 425859ff0d ("perf annotate: No need to calculate notes->start twice")
Link: http://lkml.kernel.org/r/20190221095716.39529-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-21 17:00:35 -03:00
Arnaldo Carvalho de Melo
1101f69af5 pref tools: Add missing map.h includes
Lots of places get the map.h file indirectly, and since we're going to
remove it from machine.h, then those need to include it directly, do it
now, before we remove that dep.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ob8jehdjda8h5jsrv9dqj9tf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Arnaldo Carvalho de Melo
68c0188ea7 perf symbols: Remove some unnecessary includes from symbol.h
And fixup the fallout in places like annotation and jitdump that were
using things like dirname() but weren't including libgen.h, etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-wrii9hy1a1wathc0398f9fgt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:09 +01:00
Ivan Krylov
442b4eb3af perf annotate: Pass filename to objdump via execl
The symbol__disassemble() function uses shell to launch objdump and
filter its output via grep. Passing filenames by interpolating them into
the command line via "%s" may lead to problems if said filenames contain
special characters.

Instead, pass the filename as a command line argument where it is not
subject to any kind of interpretation, then use quoted shell
interpolation to build the strings we need safely.

Signed-off-by: Ivan Krylov <krylov.r00t@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181014111803.5d83b806@Tarkus
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-04 12:54:49 -03:00
Eugeniy Paltsev
6d99a79cb4 perf annotate: Introduce basic support for ARC
Introduce basic 'perf annotate' support for ARC to be able to use
anotation via stdio interface.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Brodkin <alexey.brodkin@synopsys.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
Link: http://lkml.kernel.org/r/20181204175118.25232-1-Eugeniy.Paltsev@synopsys.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:59:42 -03:00
Ingo Molnar
adba163441 perf tools: Fix diverse comment typos
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.

No change in functionality intended.

Committer notes:

This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease
cherry-picking and/or backporting, split this into multiple patches.

Just typos in comments, no need to backport, reducing the possibility of
possible backporting artifacts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:47 -03:00
Jin Yao
246fda09c1 perf annotate: Create a annotate2 flag in struct symbol
We often use the symbol__annotate2() to annotate a specified symbol.
While annotating may take some time, so in order to avoid annotating the
same symbol repeatedly, the patch creates a new flag to indicate the
symbol has been annotated.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:40 -03:00
Jin Yao
ace4f8faea perf annotate: Compute average IPC and IPC coverage per symbol
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example:

  $ perf annotate --stdio2

  Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

                          Disassembly of section .text:

                          000000000003aac0 <random@@GLIBC_2.2.5>:
    8.32  3.28              sub    $0x18,%rsp
          3.28              mov    $0x1,%esi
          3.28              xor    %eax,%eax
          3.28              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
   11.57  3.28     1      ↓ je     20
                            lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    29
                          ↓ jmp    43
   11.57  1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
    0.00  1.10     1      ↓ je     43
                      29:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_lock_wait_private
                            add    $0x80,%rsp
    0.00  3.00        43:   lea    __ctype_b@GLIBC_2.2.5+0x38,%rdi
          3.00              lea    0xc(%rsp),%rsi
    8.49  3.00     1      → callq  __random_r
    7.91  1.94              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
    0.00  1.94     1      ↓ je     68
                            lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    70
                          ↓ jmp    8a
    0.00  2.00        68:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
   21.56  2.00     1      ↓ je     8a
                      70:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_unlock_wake_private
                            add    $0x80,%rsp
   21.56  2.90        8a:   movslq 0xc(%rsp),%rax
          2.90              add    $0x18,%rsp
    9.03  2.90     1      ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:55:32 -03:00
David Miller
0ab4188664 perf annotate: Add Sparc support
E.g.:

  $ perf annotate --stdio2
  Samples: 7K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 3086733887
  __gettimeofday  /lib32/libc-2.27.so [Percent: local period]
  Percent│
         │
         │
         │    Disassembly of section .text:
         │
         │    000a6fa0 <__gettimeofday@@GLIBC_2.0>:
    0.47 │      save   %sp, -96, %sp
    0.73 │      sethi  %hi(0xe9000), %l7
         │    → call   __frame_state_for@@GLIBC_2.0+0x480
    0.30 │      add    %l7, 0x58, %l7     ! e9058 <nftw64@@GLIBC_2.3.3+0x818>
    1.33 │      mov    %i0, %o0
         │      mov    %i1, %o1
    0.43 │      mov    0x74, %g1
         │      ta     0x10
   88.92 │    ↓ bcc    30
    2.95 │      clr    %g1
         │      neg    %o0
         │      mov    1, %g1
    0.31 │30:   cmp    %g1, 0
         │      bne,pn %icc, a6fe4 <__gettimeofday@@GLIBC_2.0+0x44>
         │      mov    %o0, %i0
    1.96 │    ← return %i7 + 8
    2.62 │      nop
         │      sethi  %hi(0), %g1
         │      neg    %o0, %g2
         │      add    %g1, 0x160, %g1
         │      ld     [ %l7 + %g1 ], %g1
         │      st     %g2, [ %g7 + %g1 ]
         │    ← return %i7 + 8
         │      mov    -1, %o0

Signed-off-by: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20181016.205555.1070918198627611771.davem@davemloft.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-18 11:16:38 -03:00
Kim Phillips
4e67b2a5df perf annotate: Fix parsing aarch64 branch instructions after objdump update
Starting with binutils 2.28, aarch64 objdump adds comments to the
disassembly output to show the alternative names of a condition code
[1].

It is assumed that commas in objdump comments could occur in other
arches now or in the future, so this fix is arch-independent.

The fix could have been done with arm64 specific jump__parse and
jump__scnprintf functions, but the jump__scnprintf instruction would
have to have its comment character be a literal, since the scnprintf
functions cannot receive a struct arch easily.

This inconvenience also applies to the generic jump__scnprintf, which is
why we add a raw_comment pointer to struct ins_operands, so the __parse
function assigns it to be re-used by its corresponding __scnprintf
function.

Example differences in 'perf annotate --stdio2' output on an aarch64
perf.data file:

BEFORE: → b.cs   ffff200008133d1c <unwind_frame+0x18c>  // b.hs, dffff7ecc47b
AFTER : ↓ b.cs   18c

BEFORE: → b.cc   ffff200008d8d9cc <get_alloc_profile+0x31c>  // b.lo, b.ul, dffff727295b
AFTER : ↓ b.cc   31c

The branch target labels 18c and 31c also now appear in the output:

BEFORE:        add    x26, x29, #0x80
AFTER : 18c:   add    x26, x29, #0x80

BEFORE:        add    x21, x21, #0x8
AFTER : 31c:   add    x21, x21, #0x8

The Fixes: tag below is added so stable branches will get the update; it
doesn't necessarily mean that commit was broken at the time, rather it
didn't withstand the aarch64 objdump update.

Tested no difference in output for sample x86_64, power arch perf.data files.

[1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=bb7eff5206e4795ac79c177a80fe9f4630aaf730

Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: b13bbeee5e ("perf annotate: Fix branch instruction with multiple operands")
Link: http://lkml.kernel.org/r/20180827125340.a2f7e291901d17cea05daba4@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 15:51:54 -03:00
Martin Liška
1dc27f6330 perf annotate: Properly interpret indirect call
The patch changes the parsing of:

	callq  *0x8(%rbx)

from:

  0.26 │     → callq  *8

to:

  0.26 │     → callq  *0x8(%rbx)

in this case an address is followed by a register, thus one can't parse
only the address.

Committer testing:

1) run 'perf record sleep 10'
2) before applying the patch, run:

     perf annotate --stdio2 > /tmp/before

3) after applying the patch, run:

     perf annotate --stdio2 > /tmp/after

4) diff /tmp/before /tmp/after:
  --- /tmp/before 2018-08-28 11:16:03.238384143 -0300
  +++ /tmp/after  2018-08-28 11:15:39.335341042 -0300
  @@ -13274,7 +13274,7 @@
                ↓ jle    128
                  hash_value = hash_table->hash_func (key);
                  mov    0x8(%rsp),%rdi
  -  0.91       → callq  *30
  +  0.91       → callq  *0x30(%r12)
                  mov    $0x2,%r8d
                  cmp    $0x2,%eax
                  node_hash = hash_table->hashes[node_index];
  @@ -13848,7 +13848,7 @@
                   mov    %r14,%rdi
                   sub    %rbx,%r13
                   mov    %r13,%rdx
  -              → callq  *38
  +              → callq  *0x38(%r15)
                   cmp    %rax,%r13
     1.91        ↓ je     240
            1b4:   mov    $0xffffffff,%r13d
  @@ -14026,7 +14026,7 @@
                   mov    %rcx,-0x500(%rbp)
                   mov    %r15,%rsi
                   mov    %r14,%rdi
  -              → callq  *38
  +              → callq  *0x38(%rax)
                   mov    -0x500(%rbp),%rcx
                   cmp    %rax,%rcx
                 ↓ jne    9b0
<SNIP tons of other such cases>

Signed-off-by: Martin Liška <mliska@suse.cz>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kim Phillips <kim.phillips@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/bd1f3932-be2b-85f9-7582-111ee0a43b07@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-30 14:49:22 -03:00
Jiri Olsa
2354ae9bdc perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble()
There's no need to call dso__needs_decompress() twice in the function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180817094813.15086-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-20 08:54:59 -03:00
Jiri Olsa
88c2119077 perf annotate: Add --percent-type option
Add --percent-type option to set annotation percent type from following
choices:

  global-period, local-period, global-hits, local-hits

Examples:

  $ perf annotate --percent-type period-local --stdio | head -1
   Percent         |      Source code ... es, percent: local period)
  $ perf annotate --percent-type hits-local --stdio | head -1
   Percent         |      Source code ... es, percent: local hits)
  $ perf annotate --percent-type hits-global --stdio | head -1
   Percent         |      Source code ... es, percent: global hits)
  $ perf annotate --percent-type period-global --stdio | head -1
   Percent         |      Source code ... es, percent: global period)

The local/global keywords set if the percentage is computed in the scope
of the function (local) or the whole data (global).

The period/hits keywords set the base the percentage is computed on -
the samples period or the number of samples (hits).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:53 -03:00
Jiri Olsa
4c04868fbe perf annotate: Display percent type in stdio output
In following patches we will allow to switch percent type even for stdio
annotation outputs. Adding the percent type value into the annotation
outputs title.

  $ perf annotate --stdio
   Percent         |      Sou ... instructions:u } (2805 samples, percent: local period)
  --------------------------- ... ------------------------------------------------------
  ...

  $ perf annotate --stdio2
  Samples: 2K of events 'anon ...  count (approx.): 156525487, [percent: local period]
  safe_write.c() /usr/bin/yes
  Percent
  ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:53 -03:00
Jiri Olsa
addba8b66f perf annotate: Make local period the default percent type
Currently we display the percentages in annotation output based on
number of samples hits. Switching it to period based percentage by
default, because it corresponds more to the time spent on the line.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:52 -03:00
Jiri Olsa
4c650ddc2e perf annotate: Pass 'struct annotation_options' to map_symbol__annotation_dump()
Pass 'struct annotation_options' to map_symbol__annotation_dump(), to
carry on and pass the percent_type value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-15-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:51 -03:00
Jiri Olsa
c849c12cf3 perf annotate: Pass struct annotation_options to symbol__calc_lines()
Pass struct annotation_options to symbol__calc_lines(), to carry on and
pass the percent_type value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:50 -03:00
Jiri Olsa
796ca33d5c perf annotate: Add percent_type to struct annotation_options
It will be used to carry user selection of percent type for annotation
output.

Passing the percent_type to the annotation_line__print function as the
first step and making it default to current percentage type
(PERCENT_HITS_LOCAL) value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-13-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:50 -03:00
Jiri Olsa
e58684df91 perf annotate: Add PERCENT_PERIOD_GLOBAL percent value
Adding and computing global period percent value for annotation line.
Storing it in struct annotation_data percent array under new
PERCENT_PERIOD_GLOBAL index.

At the moment it's not displayed, it's coming in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-12-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:49 -03:00
Jiri Olsa
ab371169fb perf annotate: Add PERCENT_PERIOD_LOCAL percent value
Adding and computing local period percent value for annotation line.
Storing it in struct annotation_data percent array under new
PERCENT_PERIOD_LOCAL index.

At the moment it's not displayed, it's coming in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:49 -03:00
Jiri Olsa
75a8c1ff28 perf annotate: Add PERCENT_HITS_GLOBAL percent value
Adding and computing global hits percent value for annotation line.
Storing it in struct annotation_data percent array under new
PERCENT_HITS_GLOBAL index.

At the moment it's not displayed, it's coming in following patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:48 -03:00
Jiri Olsa
6d9f0c2d5e perf annotate: Switch struct annotation_data::percent to array
So we can hold multiple percent values for annotation line.

The first member of this array is current local hits percent value
(PERCENT_HITS_LOCAL index), so no functional change is expected.

Adding annotation_data__percent function to return requested percent
value from struct annotation_data.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:48 -03:00
Jiri Olsa
2bcf73069b perf annotate: Loop group events directly in annotation__calc_percent()
We need to bring in 'struct hists' object and for that we need 'struct
perf_evsel' object in the scope.

Switching the group data loop with the evsel group loop.  It does the
same thing, but it brings evsel object, that we can use later get the
'struct hists' object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:47 -03:00
Jiri Olsa
48a1e4f238 perf annotate: Rename hist to sym_hist in annotation__calc_percent
We will need to bring in 'struct hists' variable in this scope, so it's
better we do this rename first.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:47 -03:00
Jiri Olsa
0440af74dc perf annotate: Rename local sample variables to data
Based on previous rename, changing also the local variable names to fit
properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:47 -03:00
Jiri Olsa
c2f938ba5a perf annotate: Rename struct annotation_line::samples* to data*
The name 'samples*' is little confusing because we have nested 'struct
sym_hist_entry' under annotation_line struct, which holds 'nr_samples'
as well.

Also the holding struct name is 'annotation_data' so the 'data' name
fits better.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:46 -03:00
Jiri Olsa
0683d13c1a perf annotate: Get rid of annotation__scnprintf_samples_period()
We have more current function tto get the title for annotation,
which is hists__scnprintf_title. They both have same output as
far as the annotation's header line goes.

They differ in counting of the nr_samples, hists__scnprintf_title
provides more accurate number based on the setup of the
symbol_conf.filter_relative variable.

Plus it also displays any uid/thread/dso/socket filters/zooms
if there are set any, which annotation__scnprintf_samples_period
does not.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:46 -03:00
Jiri Olsa
5ecf7d30eb perf annotate: Make annotation_line__max_percent static
There's no outside user of it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180804130521.11408-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:45 -03:00
Jiri Olsa
7a3e71e0d8 perf annotate: Make symbol__annotate_fprintf2() local
There's no outside user of it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lkml.kernel.org/r/20180804130521.11408-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-08-08 15:55:45 -03:00
Arnaldo Carvalho de Melo
8d628d26b9 perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULL
Making it a bit more robust, this took place here when a sample appeared
right after:

  ffffffff8a925000 D __nosave_end

And before the next considered symbol, which, using kallsyms make us
over guess the size of __nosave_end, and then the sequence:

  hist_entry__inc_addr_samples ->
    symbol__inc_addr_samples ->
      symbol__hists ->
        annotated_source__alloc_histograms

Ends up not liking to allocate gigabytes of ram for annotation...

This will be alleviated by considering BSS symbols, which we should but
don't so far, and then we should investigate those samples further.

The testcase was to have:

   perf top -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions

Running for a while till it segfaulted trying to access NULL notes->src->histograms.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ndfjtpiop3tdcnyjgp320ra8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 12:52:08 -03:00
Arnaldo Carvalho de Melo
f178fd2d49 perf annotate: Move objdump_path to struct annotation_options
One more step in grouping annotation options.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-sogzdhugoavm6fyw60jnb0vs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:53 -03:00
Arnaldo Carvalho de Melo
a47e843edc perf annotate: Move disassembler_style global to annotation_options
Continuing to group annotation specific stuff into a struct.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-p3cdhltj58jt0byjzg3g7obx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:53 -03:00
Arnaldo Carvalho de Melo
1eddd9e410 perf annotate: Adopt anotation options from symbol_conf
Continuing to group annotation options in an annotation specific struct.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-astei92tzxp4yccag5pxb2h7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:53 -03:00
Arnaldo Carvalho de Melo
380195e2b0 perf annotate: Pass annotation_options to symbol__annotate()
Now all callers to symbol__disassemble() can hand it the per-tool
annotation_options, which will allow us to remove lots of stuff
from symbol_options, the kitchen sink of perf configs, reducing its
size and getting annotation specific stuff grouped together.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vpr7ys7ggvs2fzpg8wbjcw7e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:53 -03:00
Arnaldo Carvalho de Melo
982d410bc6 perf annotate stdio: Use annotation_options consistently
Accross all the routines, this way we can have eventually have a
consistent set of defaults for all UIs.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6qgtixurjgdk5u0n3rw78ges@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:52 -03:00
Arnaldo Carvalho de Melo
14c8dde170 perf annotate: Replace symbol__alloc_hists() with symbol__hists()
Its a bit shorter, so ditch the old symbol__alloc_hists() function.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-m7tienxk7dijh5ln62yln1m9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:52 -03:00
Arnaldo Carvalho de Melo
0693f7588a perf annotate: Stop using symbol_conf.nr_events global in symbol__hists()
Since now we have evsel->evlist->nr_entries in the single place calling
this function, use it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9mgosbqa977h39j4i9ys8t75@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:52 -03:00
Arnaldo Carvalho de Melo
c6b635eece perf annotate: Introduce symbol__cycle_hists()
In this case we're wanting just notes->src->cycles_hist, allocating it if needed.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-pqj81aneunhftlntm66tmhz0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
e8ea922a7e perf annotate: Introduce symbol__hists()
In this case we're wanting just notes->src->histograms, allocating it if needed.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4iatualjskia7sojmdb65cmm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
e1a91a834d perf annotate: __symbol__inc_addr_samples() needs just annotated_source
It only operates on the histograms, so no need for the encompassing
'struct annotation'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2se2v7rrjil0kwqywks04ey2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
be3e26d99c perf annotate: Introduce annotated_source__alloc_histograms
So that we can call it independently, in contexts were we know we
already have notes->src allocated.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-f5fn7tr1asey6g013wavpn4c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
ca39650309 perf annotate: Introduce constructor/destructor for annotated_source
More stuff will go in there, all the parts that are not needed when a
symbol had no samples and that were mistakenly added to 'struct
annotation'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-u4761kyzhixw9ydk6kib3f0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
116c626b9a perf annotate: Split allocation of annotated_source struct
So that we can allocate just the notes->src->cyc_hist, that, unlike
notes->src->histograms, is not per event, and in paths where we
need to lazily allocate notes->src->cyc_hist we don't have the
number of events handy to also allocate ->histograms.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tsx7dhxzpi0criyx0sio3pz3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
f40dd6d1b4 perf annotate: __symbol__acount_cycles doesn't need notes
It only operates on the notes->src->cyc_hist, just pass that to it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zd1cu4zwmu21k0cxlr83y6vr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:51 -03:00
Arnaldo Carvalho de Melo
e345f3bd9b perf annotate: Pass perf_evsel instead of just evsel->idx
The code gets shorter and we'll be able to use evsel->evlist in a
followup patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-t0s7vy19wq5kak74kavm8swf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:50 -03:00
Jin Yao
787e4da9f9 perf annotate: Show group event string for stdio
When we enable the group, for tui/stdio2, the output first line includes
the group event string. While for stdio, it will show only one event.

For example,

perf record -e cycles,branches ./div
perf annotate --group --stdio

    Percent |      Source code & Disassembly of div for cycles (44407 samples)
    ......

The first line doesn't include the event 'branches'.

With this patch, it will show the correct group even string.

perf annotate --group --stdio

    Percent |      Source code & Disassembly of div for cycles, branches (44407 samples)
    ......

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526989115-14435-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-23 10:26:40 -03:00
Jin Yao
3e71fc0319 perf annotate: Create hotkey 'c' to show min/max cycles
In the 'perf annotate' view, a new hotkey 'c' is created for showing the
min/max cycles.

For example, when press 'c', the annotate view is:

  Percent│ IPC     Cycle(min/max)
         │
         │
         │                             Disassembly of section .text:
         │
         │                             000000000003aab0 <random@@GLIBC_2.2.5>:
    8.22 │3.92                           sub    $0x18,%rsp
         │3.92                           mov    $0x1,%esi
         │3.92                           xor    %eax,%eax
         │3.92                           cmpl   $0x0,argp_program_version_hook@@G
         │3.92             1(2/1)      ↓ je     20
         │                               lock   cmpxchg %esi,__abort_msg@@GLIBC_P
         │                             ↓ jne    29
         │                             ↓ jmp    43
         │1.10                     20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+
    8.93 │1.10             1(5/1)      ↓ je     43

When press 'c' again, the annotate view is switched back:

  Percent│ IPC Cycle
         │
         │
         │                Disassembly of section .text:
         │
         │                000000000003aab0 <random@@GLIBC_2.2.5>:
    8.22 │3.92              sub    $0x18,%rsp
         │3.92              mov    $0x1,%esi
         │3.92              xor    %eax,%eax
         │3.92              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x
         │3.92     1      ↓ je     20
         │                  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
         │                ↓ jne    29
         │                ↓ jmp    43
         │1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
    8.93 │1.10     1      ↓ je     43

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526569118-14217-3-git-send-email-yao.jin@linux.intel.com
[ Rename all maxmin to minmax ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-19 06:42:49 -03:00
Jin Yao
48659ebf37 perf annotate: Record the min/max cycles
Currently perf has a feature to account cycles for LBRs

For example, on skylake:

  perf record -b ...
  perf report or perf annotate

And then browsing the annotate browser gives average cycle counts for
program blocks.

For some analysis it would be useful if we could know not only the
average cycles but also the min and max cycles.

This patch records the min and max cycles.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao.jin@linux.intel.com
[ Switch from max/min to min/max ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-18 16:31:41 -03:00
Jin Yao
04d2600ab6 perf annotate: Display all available events on --stdio
When we perform the following command lines:

  $ perf record -e "{cycles,branches}" ./div
  $ perf annotate main --stdio

The output shows only the first event, "cycles" and the displaying
format is not correct.

   Percent         |      Source code & Disassembly of div for cycles (44550 samples)
  -----------------------------------------------------------------------------------
                   :
                   :
                   :
                   :            Disassembly of section .text:
                   :
                   :            00000000004004b0 <main>:
                   :            main():
                   :
                   :                    return i;
                   :            }
                   :
                   :            int main(void)
                   :            {
      0.00 :   4004b0:       push   %rbx
                   :                    int i;
                   :                    int flag;
                   :                    volatile double x = 1212121212, y = 121212;
                   :
                   :                    s_randseed = time(0);
      0.00 :   4004b1:       xor    %edi,%edi
                   :                    srand(s_randseed);
      0.00 :   4004b3:       mov    $0x77359400,%ebx
                   :
                   :                    return i;
                   :            }

The issue is that the value of the 'nr_percent' variable is hardcoded to
1.  This patch fixes it.

With this patch, the output is:

   Percent         |      Source code & Disassembly of div for cycles (44550 samples)
  -----------------------------------------------------------------------------------
                   :
                   :
                   :
                   :            Disassembly of section .text:
                   :
                   :            00000000004004b0 <main>:
                   :            main():
                   :
                   :                    return i;
                   :            }
                   :
                   :            int main(void)
                   :            {
      0.00    0.00 :   4004b0:       push   %rbx
                   :                    int i;
                   :                    int flag;
                   :                    volatile double x = 1212121212, y = 121212;
                   :
                   :                    s_randseed = time(0);
      0.00    0.00 :   4004b1:       xor    %edi,%edi
                   :                    srand(s_randseed);
      0.00    0.00 :   4004b3:       mov    $0x77359400,%ebx
                   :
                   :                    return i;
                   :            }

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: f681d593d1 ("perf annotate: Remove disasm__calc_percent() from disasm_line__print()")
Link: http://lkml.kernel.org/r/1525881435-4092-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-10 15:19:30 -03:00
Arnaldo Carvalho de Melo
43c4023152 perf annotate: Allow setting the offset level in .perfconfig
The default is 1 (jump_target):

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26        nop
    4.61        push   %rbx
   19.33        pushfq
    7.97        pop    %rax
    0.32        nop
    0.06        mov    %rax,%rbx
   14.63        cli
    0.06        nop
                xor    %eax,%eax
                mov    $0x1,%edx
   49.94        lock   cmpxchg %edx,(%rdi)
    0.16        test   %eax,%eax
              ↓ jne    2b
    2.66        mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

But one can ask for showing offsets for call instructions by setting
this:

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26        nop
    4.61        push   %rbx
   19.33        pushfq
    7.97        pop    %rax
    0.32        nop
    0.06        mov    %rax,%rbx
   14.63        cli
    0.06        nop
                xor    %eax,%eax
                mov    $0x1,%edx
   49.94        lock   cmpxchg %edx,(%rdi)
    0.16        test   %eax,%eax
              ↓ jne    2b
    2.66        mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
          2d: → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

Or using a big value to ask for all offsets to be shown:

  # cat ~/.perfconfig
  [annotate]

	offset_level = 100

	hide_src_code = true
  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
  _raw_spin_lock_irqsave() /proc/kcore
    0.26   0:   nop
    4.61   5:   push   %rbx
   19.33   6:   pushfq
    7.97   7:   pop    %rax
    0.32   8:   nop
    0.06   d:   mov    %rax,%rbx
   14.63  10:   cli
    0.06  11:   nop
          17:   xor    %eax,%eax
          19:   mov    $0x1,%edx
   49.94  1e:   lock   cmpxchg %edx,(%rdi)
    0.16  22:   test   %eax,%eax
          24: ↓ jne    2b
    2.66  26:   mov    %rbx,%rax
          29:   pop    %rbx
          2a: ← retq
          2b:   mov    %eax,%esi
          2d: → callq  *ffffffffb30eaed0
          32:   mov    %rbx,%rax
          35:   pop    %rbx
          36: ← retq
   #

This also affects the TUI, i.e. the default 'perf annotate' and 'perf
top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
in the build, i.e.:

  # perf version --build-options | grep slang
              libslang: [ on  ]  # HAVE_SLANG_SUPPORT
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-13 10:00:05 -03:00
Arnaldo Carvalho de Melo
592c10e217 perf annotate: Allow showing offsets in more than just jump targets
Jesper wanted to see offsets at callq sites when doing some performance
investigation related to retpolines, so save him some time by providing
an 'struct annotation_options' to control where offsets should appear:
just on jump targets? That + call instructions? All?

This puts in place the logic to show the offsets, now we need to wire
this up in the TUI browser (next patch) and on the 'perf annotate --stdio2"
interface, where we need a more general mechanism to setup the
'annotation_options' struct from the command line.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-m3jc9c3swobye9tj08gnh5i7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-12 10:32:39 -03:00
Arnaldo Carvalho de Melo
c0459a0925 perf annotate: Show group details on the title line
To match what is shown in the main 'perf report/top' title lines, i.e.
if a group is being shown, either a real group (recorded with "-e
'{a,b,c}') or a forced group (using 'perf report --group' for a
perf.data file recorded without {}) we will show multiple columns,
one per event, but we were failing to show the group details, so, for:

 # perf report --header-only | grep cmdline
 # cmdline : /home/acme/bin/perf record -e {cycles,instructions,cache-misses}
 # perf report --group

The first line was showing just "cycles", now it shows the correct line,
which is:

  Samples: 578  of events 'anon group { cycles, instructions, cache-misses }', 4000 Hz, Event count (approx.): 487421794
  syscall_return_via_sysret  /lib/modules/4.16.0-rc7/build/vmlinux
    0.22   2.97   0.00 │    ↓ jmp    6c
                       │      mov    %cr3,%rdi
    1.33  10.89   4.00 │    ↓ jmp    62
                       │      mov    %rdi,%rax
<SNIP>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 6920e2854e ("perf annotate browser: Show extra title line with event information")
Link: https://lkml.kernel.org/n/tip-i41tqh17c2dabnyzjh99r1oz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-05 11:18:39 -03:00
Arnaldo Carvalho de Melo
520d3f01ea perf annotate stdio2: Print more descriptive event information header
To match the recently added event header information to --tui, e.g.:

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  Samples: 128  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 48617682
  _raw_spin_lock_irqsave() /proc/kcore
    0.78        nop
    7.03        push   %rbx
    3.12        pushfq
    6.25        pop    %rax
                nop
                mov    %rax,%rbx
    3.12        cli
                nop
                xor    %eax,%eax
                mov    $0x1,%edx
   79.69        lock   cmpxchg %edx,(%rdi)
                test   %eax,%eax
              ↓ jne    2b
                mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  *ffffffffb30eaed0
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ujy46x7cldyhyxelyf2b9quy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 16:05:13 -03:00
Arnaldo Carvalho de Melo
b213eac245 perf annotate: Introduce annotation__scnprintf_samples_period() method
To print a string using the total period (nr_events) and the number of
samples for a given annotation, i.e. for a given symbol, the counterpart
to hists__scnprintf_samples_period(), that is for all the samples in a
session (be it a live session, think 'perf top' or a perf.data file,
think 'perf report').

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935
Link: https://lkml.kernel.org/n/tip-goj2wu4fxutc8vd46mw3yg14@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 15:22:55 -03:00
Arnaldo Carvalho de Melo
980b68ec06 perf annotate: Use absolute addresses to calculate jump target offsets
These types of jumps were confusing the annotate browser:

entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux

entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux
  Percent│ffffffff81a00020:   swapgs
  <SNIP>
         │ffffffff81a00128: ↓ jae    ffffffff81a00139 <syscall_return_via_sysret+0x53>
  <SNIP>
         │ffffffff81a00155: → jmpq   *0x825d2d(%rip)   # ffffffff82225e88 <pv_cpu_ops+0xe8>

I.e. the syscall_return_via_sysret function is actually "inside" the
entry_SYSCALL_64 function, and the offsets in jumps like these (+0x53)
are relative to syscall_return_via_sysret, not to syscall_return_via_sysret.

Or this may be some artifact in how the assembler marks the start and
end of a function and how this ends up in the ELF symtab for vmlinux,
i.e. syscall_return_via_sysret() isn't "inside" entry_SYSCALL_64, but
just right after it.

From readelf -sw vmlinux:

 80267: ffffffff81a00020   315 NOTYPE  GLOBAL DEFAULT    1 entry_SYSCALL_64
   316: ffffffff81a000e6     0 NOTYPE  LOCAL  DEFAULT    1 syscall_return_via_sysret

 0xffffffff81a00020 + 315 > 0xffffffff81a000e6

So instead of looking for offsets after that last '+' sign, calculate
offsets for jump target addresses that are inside the function being
disassembled from the absolute address, 0xffffffff81a00139 in this case,
subtracting from it the objdump address for the start of the function
being disassembled, entry_SYSCALL_64() in this case.

So, before this patch:

entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux
Percent│       pop    %r10
       │       pop    %r9
       │       pop    %r8
       │       pop    %rax
       │       pop    %rsi
       │       pop    %rdx
       │       pop    %rsi
       │       mov    %rsp,%rdi
       │       mov    %gs:0x5004,%rsp
       │       pushq  0x28(%rdi)
       │       pushq  (%rdi)
       │       push   %rax
       │     ↑ jmp    6c
       │       mov    %cr3,%rdi
       │     ↑ jmp    62
       │       mov    %rdi,%rax
       │       and    $0x7ff,%rdi
       │       bt     %rdi,%gs:0x2219a
       │     ↑ jae    53
       │       btr    %rdi,%gs:0x2219a
       │       mov    %rax,%rdi
       │     ↑ jmp    5b

After:

entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux
  0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode
       │       pop    %r10
       │       pop    %r9
       │       pop    %r8
       │       pop    %rax
       │       pop    %rsi
       │       pop    %rdx
       │       pop    %rsi
       │       mov    %rsp,%rdi
       │       mov    %gs:0x5004,%rsp
       │       pushq  0x28(%rdi)
       │       pushq  (%rdi)
       │       push   %rax
       │     ↓ jmp    132
       │       mov    %cr3,%rdi
       │    ┌──jmp    128
       │    │  mov    %rdi,%rax
       │    │  and    $0x7ff,%rdi
       │    │  bt     %rdi,%gs:0x2219a
       │    │↓ jae    119
       │    │  btr    %rdi,%gs:0x2219a
       │    │  mov    %rax,%rdi
       │    │↓ jmp    121
       │119:│  mov    %rax,%rdi
       │    │  bts    $0x3f,%rdi
       │121:│  or     $0x800,%rdi
       │128:└─→or     $0x1000,%rdi
       │       mov    %rdi,%cr3
       │132:   pop    %rax
       │       pop    %rdi
       │       pop    %rsp
       │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>

With those at least navigating to the right destination, an improvement
for these cases seems to be to be to somehow mark those inner functions,
which in this case could be:

entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux
       │syscall_return_via_sysret:
       │       pop    %r15
       │       pop    %r14
       │       pop    %r13
       │       pop    %r12
       │       pop    %rbp
       │       pop    %rbx
       │       pop    %rsi
       │       pop    %r10
       │       pop    %r9
       │       pop    %r8
       │       pop    %rax
       │       pop    %rsi
       │       pop    %rdx
       │       pop    %rsi
       │       mov    %rsp,%rdi
       │       mov    %gs:0x5004,%rsp
       │       pushq  0x28(%rdi)
       │       pushq  (%rdi)
       │       push   %rax
       │     ↓ jmp    132
       │       mov    %cr3,%rdi
       │    ┌──jmp    128
       │    │  mov    %rdi,%rax
       │    │  and    $0x7ff,%rdi
       │    │  bt     %rdi,%gs:0x2219a
       │    │↓ jae    119
       │    │  btr    %rdi,%gs:0x2219a
       │    │  mov    %rax,%rdi
       │    │↓ jmp    121
       │119:│  mov    %rax,%rdi
       │    │  bts    $0x3f,%rdi
       │121:│  or     $0x800,%rdi
       │128:└─→or     $0x1000,%rdi
       │       mov    %rdi,%cr3
       │132:   pop    %rax
       │       pop    %rdi
       │       pop    %rsp
       │     → jmpq   *0x825d2d(%rip)        # ffffffff82225e88 <pv_cpu_ops+0xe8>

This all gets much better viewed if one uses 'perf report --ignore-vmlinux'
forcing the usage of /proc/kcore + /proc/kallsyms, when the above
actually gets down to:

  # perf report --ignore-vmlinux
  ## do '/64', will show the function names containing '64',
  ## navigate to /entry_SYSCALL_64_after_hwframe.annotation,
  ## press 'A' to annotate, then 'P' to print that annotation
  ## to a file
  ## From another xterm (or see on screen, this 'P' thing is for
  ## getting rid of those right side scroll bars/spaces):
  # cat /entry_SYSCALL_64_after_hwframe.annotation
  entry_SYSCALL_64_after_hwframe() /proc/kcore
  Event: cycles:ppp

  Percent
              Disassembly of section load0:

              ffffffff9aa00044 <load0>:
   11.97        push   %rax
    4.85        push   %rdi
                push   %rsi
    2.59        push   %rdx
    2.27        push   %rcx
    0.32        pushq  $0xffffffffffffffda
    1.29        push   %r8
                xor    %r8d,%r8d
    1.62        push   %r9
    0.65        xor    %r9d,%r9d
    1.62        push   %r10
                xor    %r10d,%r10d
    5.50        push   %r11
                xor    %r11d,%r11d
    3.56        push   %rbx
                xor    %ebx,%ebx
    4.21        push   %rbp
                xor    %ebp,%ebp
    2.59        push   %r12
    0.97        xor    %r12d,%r12d
    3.24        push   %r13
                xor    %r13d,%r13d
    2.27        push   %r14
                xor    %r14d,%r14d
    4.21        push   %r15
                xor    %r15d,%r15d
    0.97        mov    %rsp,%rdi
    5.50      → callq  do_syscall_64
   14.56        mov    0x58(%rsp),%rcx
    7.44        mov    0x80(%rsp),%r11
    0.32        cmp    %rcx,%r11
              → jne    swapgs_restore_regs_and_return_to_usermode
    0.32        shl    $0x10,%rcx
    0.32        sar    $0x10,%rcx
    3.24        cmp    %rcx,%r11
              → jne    swapgs_restore_regs_and_return_to_usermode
    2.27        cmpq   $0x33,0x88(%rsp)
    1.29      → jne    swapgs_restore_regs_and_return_to_usermode
                mov    0x30(%rsp),%r11
    8.74        cmp    %r11,0x90(%rsp)
              → jne    swapgs_restore_regs_and_return_to_usermode
    0.32        test   $0x10100,%r11
              → jne    swapgs_restore_regs_and_return_to_usermode
    0.32        cmpq   $0x2b,0xa0(%rsp)
    0.65      → jne    swapgs_restore_regs_and_return_to_usermode

I.e. using kallsyms makes the function start/end be done differently
than using what is in the vmlinux ELF symtab and actually the hits
goes to entry_SYSCALL_64_after_hwframe, which is a GLOBAL() after the
start of entry_SYSCALL_64:

  ENTRY(entry_SYSCALL_64)
          UNWIND_HINT_EMPTY
  <SNIP>
          pushq   $__USER_CS                      /* pt_regs->cs */
          pushq   %rcx                            /* pt_regs->ip */
  GLOBAL(entry_SYSCALL_64_after_hwframe)
          pushq   %rax                            /* pt_regs->orig_ax */

          PUSH_AND_CLEAR_REGS rax=$-ENOSYS

And it goes and ends at:

          cmpq    $__USER_DS, SS(%rsp)            /* SS must match SYSRET */
          jne     swapgs_restore_regs_and_return_to_usermode

          /*
           * We win! This label is here just for ease of understanding
           * perf profiles. Nothing jumps here.
           */
  syscall_return_via_sysret:
          /* rcx and r11 are already restored (see code above) */
          UNWIND_HINT_EMPTY
          POP_REGS pop_rdi=0 skip_r11rcx=1

So perhaps some people should really just play with '--ignore-vmlinux'
to force /proc/kcore + kallsyms.

One idea is to do both, i.e. have a vmlinux annotation and a
kcore+kallsyms one, when possible, and even show the patched location,
etc.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-r11knxv8voesav31xokjiuo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 16:46:53 -03:00
Arnaldo Carvalho de Melo
c448234cfe perf annotate: Defer searching for comma in raw line till it is needed
That strchr() in jump__scnprintf() needs to be nuked somehow, as it,
IIRC is already done in jump__parse() and if needed at scnprintf() time,
should be stashed in the struct filled in parse() time.

For now jus defer it to just before where it is used.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-j0t5hagnphoz9xw07bh3ha3g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 16:46:19 -03:00
Arnaldo Carvalho de Melo
e4cc91b802 perf annotate: Support jumping from one function to another
For instance:

  entry_SYSCALL_64  /lib/modules/4.16.0-rc5-00086-gdf09348f78dc/build/vmlinux
    5.50 │     → callq  do_syscall_64
   14.56 │       mov    0x58(%rsp),%rcx
    7.44 │       mov    0x80(%rsp),%r11
    0.32 │       cmp    %rcx,%r11
         │     → jne    swapgs_restore_regs_and_return_to_usermode
    0.32 │       shl    $0x10,%rcx
    0.32 │       sar    $0x10,%rcx
    3.24 │       cmp    %rcx,%r11
         │     → jne    swapgs_restore_regs_and_return_to_usermode
    2.27 │       cmpq   $0x33,0x88(%rsp)
    1.29 │     → jne    swapgs_restore_regs_and_return_to_usermode
         │       mov    0x30(%rsp),%r11
    8.74 │       cmp    %r11,0x90(%rsp)
         │     → jne    swapgs_restore_regs_and_return_to_usermode
    0.32 │       test   $0x10100,%r11
         │     → jne    swapgs_restore_regs_and_return_to_usermode
    0.32 │       cmpq   $0x2b,0xa0(%rsp)
    0.65 │     → jne    swapgs_restore_regs_and_return_to_usermode

It'll behave just like a "call" instruction, i.e. press enter or right
arrow over one such line and the browser will navigate to the annotated
disassembly of that function, which when exited, via left arrow or esc,
will come back to the calling function.

Now to support jump to an offset on a different function...

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-78o508mqvr8inhj63ddtw7mo@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 16:46:18 -03:00
Arnaldo Carvalho de Melo
2eff061162 perf annotate: Add "_local" to jump/offset validation routines
Because they all really check if we can access data structures/visual
constructs where a "jump" instruction targets code in the same function,
i.e. things like:

  __pthread_mutex_lock  /usr/lib64/libpthread-2.26.so
  1.95 │       mov    __pthread_force_elision,%ecx
       │    ┌──test   %ecx,%ecx
  0.07 │    ├──je     60
       │    │  test   $0x300,%esi
       │    │↓ jne    60
       │    │  or     $0x100,%esi
       │    │  mov    %esi,0x10(%rdi)
       │ 42:│  mov    %esi,%edx
       │    │  lea    0x16(%r8),%rsi
       │    │  mov    %r8,%rdi
       │    │  and    $0x80,%edx
       │    │  add    $0x8,%rsp
       │    │→ jmpq   __lll_lock_elision
       │    │  nop
  0.29 │ 60:└─→and    $0x80,%esi
  0.07 │       mov    $0x1,%edi
  0.29 │       xor    %eax,%eax
  2.53 │       lock   cmpxchg %edi,(%r8)

And not things like that "jmpq __lll_lock_elision", that instead should behave
like a "call" instruction and "jump" to the disassembly of "___lll_lock_elision".

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3cwx39u3h66dfw9xjrlt7ca2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-23 16:46:16 -03:00
Arnaldo Carvalho de Melo
751b1783da perf annotate: Mark jumps to outher functions with the call arrow
Things like this in _cpp_lex_token (gcc's cc1 program):

     cpp_named_operator2name@@Base+0xa72

Point to a place that is after the cpp_named_operator2name boundaries,
i.e.  in the ELF symbol table for cc1 cpp_named_operator2name is marked
as being 32-bytes long, but it in fact is much larger than that, so we
seem to need a symbols__find() routine that looks for >= current->start
and  < next_symbol->start, possibly just for C++ objects?

For now lets just make some progress by marking jumps to outside the
current function as call like.

Actual navigation will come next, with further understanding of how the
symbol searching and disassembly should be done.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-aiys0a0bsgm3e00hbi6fg7yy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 16:19:55 -03:00
Arnaldo Carvalho de Melo
85a84e4f81 perf annotate: Pass function descriptor to its instruction parsing routines
We need that to figure out if jumps have targets in a different
function.

E.g. _cpp_lex_token(), in /usr/libexec/gcc/x86_64-redhat-linux/5.3.1/cc1
has a line like this:

  jne    c469be <cpp_named_operator2name@@Base+0xa72>

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ris0ioziyp469pofpzix2atb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 16:19:41 -03:00
Arnaldo Carvalho de Melo
425859ff0d perf annotate: No need to calculate notes->start twice
Since we already set notes->start to map__rip_2objdump(map, sym->start)
in symbol__annotate2(), no need to calculate that address again in
symbol__calc_lines(), just use notes->start.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ycxlg8mm5ueuj21w6gi62l7g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:43 -03:00
Arnaldo Carvalho de Melo
d9bd766584 perf annotate browser: Add 'P' hotkey to dump annotation to file
Just like we have in the histograms browser used as the main screen for
'perf top --tui' and 'perf report --tui', to print the current
annotation to a file with a named composed by the symbol name and the
".annotation" suffix.

Here is one example of pressing 'A' on 'perf top' to live annotate a
kernel function and then press 'P' to dump that annotation, the
resulting file:

  # cat _raw_spin_lock_irqsave.annotation
  _raw_spin_lock_irqsave() /proc/kcore
  Event: cycles:ppp

    7.14        nop
   21.43        push   %rbx
    7.14        pushfq
                pop    %rax
                nop
                mov    %rax,%rbx
                cli
                nop
                xor    %eax,%eax
                mov    $0x1,%edx
   64.29        lock   cmpxchg %edx,(%rdi)
                test   %eax,%eax
              ↓ jne    2b
                mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  queued_spin_lock_slowpath
                mov    %rbx,%rax
                pop    %rbx
              ← retq
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zzmnrwugb5vtk7bvg0rbx150@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:43 -03:00
Arnaldo Carvalho de Melo
864298f224 perf annotate: Add function header to --stdio2
# perf annotate --stdio2 _raw_spin_lock_irqsave
  _raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
  Event: anon group { cycles, instructions }

    0.00   3.17      → callq  __fentry__
    0.00   7.94        push   %rbx
    7.69  36.51      → callq  __page_file_index
                       mov    %rax,%rbx
    7.69   3.17      → callq  *ffffffff82225cd0
                       xor    %eax,%eax
                       mov    $0x1,%edx
   80.77  49.21        lock   cmpxchg %edx,(%rdi)
                       test   %eax,%eax
                     ↓ jne    2b
    3.85   0.00        mov    %rbx,%rax
                       pop    %rbx
                     ← retq
                 2b:   mov    %eax,%esi
                     → callq  queued_spin_lock_slowpath
                       mov    %rbx,%rax
                       pop    %rbx
                     ← retq
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-i86yfyzl8m194ioxgj1jo32f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:41 -03:00
Arnaldo Carvalho de Melo
3563289208 perf annotate: Use the default annotation options for --stdio2
With an empty '[annotate]' section in ~/.perfconfig:

  # perf record -a --all-kernel -e '{cycles,instructions}:P' sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.243 MB perf.data (5513 samples) ]
  # perf annotate --stdio2 _raw_spin_lock | head -20

                     Disassembly of section .text:

                     ffffffff81868790 <_raw_spin_lock>:
                     _raw_spin_lock():
                     EXPORT_SYMBOL(_raw_spin_trylock_bh);
                     #endif

                     #ifndef CONFIG_INLINE_SPIN_LOCK
                     void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                     {
                     → callq  __fentry__
                     atomic_cmpxchg():
                             return xadd(&v->counter, -i);
                     }

                     static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                     {
  # perf annotate --stdio2 _raw_spin_lock | head -20
                     → callq  __fentry__
                       xor    %eax,%eax
                       mov    $0x1,%edx
   87.50 100.00        lock   cmpxchg %edx,(%rdi)
    6.25   0.00        test   %eax,%eax
                     ↓ jne    16
    6.25   0.00        repz   retq
                 16:   mov    %eax,%esi
                     ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
  #
  # cat ~/.perfconfig
  [annotate]

    hide_src_code = false
    show_linenr = true
  # perf annotate --stdio2 _raw_spin_lock | head -20

                 3   Disassembly of section .text:

                 5   ffffffff81868790 <_raw_spin_lock>:
                 6   _raw_spin_lock():
                 143 EXPORT_SYMBOL(_raw_spin_trylock_bh);
                 144 #endif

                 146 #ifndef CONFIG_INLINE_SPIN_LOCK
                 147 void __lockfunc _raw_spin_lock(raw_spinlock_t *lock)
                 148 {
                     → callq  __fentry__
                 150 atomic_cmpxchg():
                 187         return xadd(&v->counter, -i);
                 188 }

                 190 static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
                 191 {
  #
  # cat ~/.perfconfig
  [annotate]

    hide_src_code = true
    show_total_period = true
  # perf annotate --stdio2 _raw_spin_lock | head -20
                               → callq  __fentry__
                                 xor    %eax,%eax
                                 mov    $0x1,%edx
      1411316      152339        lock   cmpxchg %edx,(%rdi)
       344694           0        test   %eax,%eax
                               ↓ jne    16
        80806           0        repz   retq
                           16:   mov    %eax,%esi
                               ↑ jmpq   ffffffff810e96b0 <queued_spin_lock_slowpath>
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-nu4rxg5zkdtgs1b2gc40p7v7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:41 -03:00
Arnaldo Carvalho de Melo
7f0b6fde31 perf annotate: Move the default annotate options to the library
One more thing that goes from the TUI code to be used more widely,
for instance it'll affect the default options used by:

  perf annotate --stdio2

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0nsz0dm0akdbo30vgja2a10e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:40 -03:00
Arnaldo Carvalho de Melo
befd2a38a6 perf annotate: Introduce the --stdio2 output mode
This uses the TUI augmented formatting routines, modulo interactivity.

  # perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
  _raw_spin_lock_irqsave() /proc/kcore
  Event: cycles:ppp

  Percent

              Disassembly of section load0:

              ffffffff9a8734b0 <load0>:
                nop
                push   %rbx
   50.00        pushfq
                pop    %rax
                nop
                mov    %rax,%rbx
                cli
                nop
                xor    %eax,%eax
                mov    $0x1,%edx
   50.00        lock   cmpxchg %edx,(%rdi)
                test   %eax,%eax
              ↓ jne    2b
                mov    %rbx,%rax
                pop    %rbx
              ← retq
          2b:   mov    %eax,%esi
              → callq  queued_spin_lock_slowpath
                mov    %rbx,%rax
                pop    %rbx
              ← retq

Tested-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-21 12:53:26 -03:00
Arnaldo Carvalho de Melo
c298304bd7 perf annotate: Use a ops table for annotation_line__write()
To simplify the passing of arguments, the --stdio2 code will have to set
all the fields with operations printing to stdout.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-pcs3c7vdy9ucygxflo4nl1o7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-20 15:36:18 -03:00