mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2025-09-04 20:19:47 +08:00
2cb15faaed
81 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cad3b68954 |
perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
'perf stat' has options to aggregate the counts in different modes like
per socket, per core etc. The function "aggr_printout" in
util/stat-display.c which is used to print the aggregates, has a check
for cpu in case of AGGR_NONE.
This check was originally using condition : "if (id.cpu.cpu > -1)". But
this got changed after commit
|
||
|
|
fa2edc07b4 |
perf stat: Rename to aggr_cpu_id.thread_idx
The aggr_cpu_id has a thread value but it's actually an index to the thread_map. To reduce possible confusion, rename it to thread_idx. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
87ae87fd6c |
perf stat: Use thread map index for shadow stat
When AGGR_THREAD is active, it aggregates the values for each thread. Previously it used cpu map index which is invalid for AGGR_THREAD so it had to use separate runtime stats with index 0. But it can just use the rt_stat with thread_map_index. Rename the first_shadow_map_idx() and make it return the thread index. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
66b76e30ee |
perf stat: Convert perf_stat_evsel.res_stats array
It uses only one member, no need to have it as an array. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
df936cadfb |
perf stat: Add JSON output option
CSV output is tricky to format and column layout changes are susceptible
to breaking parsers. New JSON-formatted output has variable names to
identify fields that are consistent and informative, making the output
parseable.
CSV output example:
1.20,msec,task-clock:u,1204272,100.00,0.697,CPUs utilized
0,,context-switches:u,1204272,100.00,0.000,/sec
0,,cpu-migrations:u,1204272,100.00,0.000,/sec
70,,page-faults:u,1204272,100.00,58.126,K/sec
JSON output example:
{"counter-value" : "3805.723968", "unit" : "msec", "event" :
"cpu-clock", "event-runtime" : 3805731510100.00, "pcnt-running"
: 100.00, "metric-value" : 4.007571, "metric-unit" : "CPUs utilized"}
{"counter-value" : "6166.000000", "unit" : "", "event" :
"context-switches", "event-runtime" : 3805723045100.00, "pcnt-running"
: 100.00, "metric-value" : 1.620191, "metric-unit" : "K/sec"}
{"counter-value" : "466.000000", "unit" : "", "event" :
"cpu-migrations", "event-runtime" : 3805727613100.00, "pcnt-running"
: 100.00, "metric-value" : 122.447136, "metric-unit" : "/sec"}
{"counter-value" : "208.000000", "unit" : "", "event" :
"page-faults", "event-runtime" : 3805726799100.00, "pcnt-running"
: 100.00, "metric-value" : 54.654516, "metric-unit" : "/sec"}
Also added documentation for JSON option.
There is some tidy up of CSV code including a potential memory over run
in the os.nfields set up. To facilitate this an AGGR_MAX value is added.
Committer notes:
Fixed up using PRIu64 to format u64 values, not %lu.
Committer testing:
⬢[acme@toolbox perf]$ perf stat -j sleep 1
{"counter-value" : "0.731750", "unit" : "msec", "event" : "task-clock:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000731, "metric-unit" : "CPUs utilized"}
{"counter-value" : "0.000000", "unit" : "", "event" : "context-switches:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "0.000000", "unit" : "", "event" : "cpu-migrations:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 0.000000, "metric-unit" : "/sec"}
{"counter-value" : "75.000000", "unit" : "", "event" : "page-faults:u", "event-runtime" : 731750, "pcnt-running" : 100.00, "metric-value" : 102.494021, "metric-unit" : "K/sec"}
{"counter-value" : "578765.000000", "unit" : "", "event" : "cycles:u", "event-runtime" : 379366, "pcnt-running" : 49.00, "metric-value" : 0.790933, "metric-unit" : "GHz"}
{"counter-value" : "1298.000000", "unit" : "", "event" : "stalled-cycles-frontend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.224271, "metric-unit" : "frontend cycles idle"}
{"counter-value" : "21984.000000", "unit" : "", "event" : "stalled-cycles-backend:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 3.798433, "metric-unit" : "backend cycles idle"}
{"counter-value" : "468197.000000", "unit" : "", "event" : "instructions:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 0.808959, "metric-unit" : "insn per cycle"}
{"metric-value" : 0.046955, "metric-unit" : "stalled cycles per insn"}
{"counter-value" : "103335.000000", "unit" : "", "event" : "branches:u", "event-runtime" : 768020, "pcnt-running" : 100.00, "metric-value" : 141.216262, "metric-unit" : "M/sec"}
{"counter-value" : "2381.000000", "unit" : "", "event" : "branch-misses:u", "event-runtime" : 388654, "pcnt-running" : 50.00, "metric-value" : 2.304156, "metric-unit" : "of all branches"}
⬢[acme@toolbox perf]$
Signed-off-by: Claire Jensen <cjense@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alyssa Ross <hi@alyssa.is>
Cc: Claire Jensen <clairej735@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Like Xu <likexu@tencent.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220805200105.2020995-2-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
9a0b36266f |
perf stat: Add topdown metrics in the default perf stat on the hybrid machine
Topdown metrics are missed in the default perf stat on the hybrid machine,
add Topdown metrics in default perf stat for hybrid systems.
Currently, we support the perf metrics Topdown for the p-core PMU in the
perf stat default, the perf metrics Topdown support for e-core PMU will be
implemented later separately. Refactor the code adds two x86 specific
functions. Widen the size of the event name column by 7 chars, so that all
metrics after the "#" become aligned again.
The perf metrics topdown feature is supported on the cpu_core of ADL. The
dedicated perf metrics counter and the fixed counter 3 are used for the
topdown events. Adding the topdown metrics doesn't trigger multiplexing.
Before:
# ./perf stat -a true
Performance counter stats for 'system wide':
53.70 msec cpu-clock # 25.736 CPUs utilized
80 context-switches # 1.490 K/sec
24 cpu-migrations # 446.951 /sec
52 page-faults # 968.394 /sec
2,788,555 cpu_core/cycles/ # 51.931 M/sec
851,129 cpu_atom/cycles/ # 15.851 M/sec
2,974,030 cpu_core/instructions/ # 55.385 M/sec
416,919 cpu_atom/instructions/ # 7.764 M/sec
586,136 cpu_core/branches/ # 10.916 M/sec
79,872 cpu_atom/branches/ # 1.487 M/sec
14,220 cpu_core/branch-misses/ # 264.819 K/sec
7,691 cpu_atom/branch-misses/ # 143.229 K/sec
0.002086438 seconds time elapsed
After:
# ./perf stat -a true
Performance counter stats for 'system wide':
61.39 msec cpu-clock # 24.874 CPUs utilized
76 context-switches # 1.238 K/sec
24 cpu-migrations # 390.968 /sec
52 page-faults # 847.097 /sec
2,753,695 cpu_core/cycles/ # 44.859 M/sec
903,899 cpu_atom/cycles/ # 14.725 M/sec
2,927,529 cpu_core/instructions/ # 47.690 M/sec
428,498 cpu_atom/instructions/ # 6.980 M/sec
581,299 cpu_core/branches/ # 9.470 M/sec
83,409 cpu_atom/branches/ # 1.359 M/sec
13,641 cpu_core/branch-misses/ # 222.216 K/sec
8,008 cpu_atom/branch-misses/ # 130.453 K/sec
14,761,308 cpu_core/slots/ # 240.466 M/sec
3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring
1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation
5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound
4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound
646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations
1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears
3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth
1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound
0.002467839 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220721065706.2886112-6-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
0b9462d0ac |
perf stat: Make use of index clearer with perf_counts
Try to disambiguate further when perf_counts is being accessed it is with a cpu map index rather than a CPU. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lv Ruyi <lv.ruyi@zte.com.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20220519032005.1273691-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
17b3867d97 |
Revert "perf stat: Support metrics with hybrid events"
This reverts commit
|
||
|
|
570c44a01b |
perf stat: Avoid printing cpus with no counters
perf_evlist's user_requested_cpus can contain CPUs not present in any evsel's cpus, for example uncore counters. Avoid printing the prefix and trailing \n until the first valid counter is encountered. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220503041757.2365696-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
2c8e64514a |
perf stat: Merge event counts from all hybrid PMUs
For hybrid events, by default stat aggregates and reports the event counts
per pmu.
# ./perf stat -e cycles -a sleep 1
Performance counter stats for 'system wide':
14,066,877,268 cpu_core/cycles/
6,814,443,147 cpu_atom/cycles/
1.002760625 seconds time elapsed
Sometimes, it's also useful to aggregate event counts from all PMUs.
Create a new option '--hybrid-merge' to enable that behavior and report
the counts without PMUs.
# ./perf stat -e cycles -a --hybrid-merge sleep 1
Performance counter stats for 'system wide':
20,732,982,512 cycles
1.002776793 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
60344f1a9a |
perf stat: Support metrics with hybrid events
One metric such as 'Kernel_Utilization' may be from different PMUs and
consists of different events.
For core,
Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread
For atom,
Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core
The metric group string for core is:
'{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'
The metric group string for atom is:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'
That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
is from cpu_atom PMU. And then next, check if the events in the group are
valid on that PMU. If one event is not valid on that PMU, the associated
group would be removed internally.
In this example, cpu_clk_unhalted.thread is valid on cpu_core and
cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
groups are passed.
Before:
# ./perf stat -M Kernel_Utilization -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }
Performance counter stats for 'system wide':
17,639,501 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 1.00 Kernel_Utilization
17,578,757 cpu_atom/CPU_CLK_UNHALTED.CORE:k/
1,005,350,226 ns duration_time
43,012,352 cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ # 0.99 Kernel_Utilization
17,608,010 cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
43,608,755 cpu_core/CPU_CLK_UNHALTED.THREAD/
17,630,838 cpu_atom/CPU_CLK_UNHALTED.THREAD/
1,005,350,226 ns duration_time
1.005350226 seconds time elapsed
After:
# ./perf stat -M Kernel_Utilization -a sleep 1
Performance counter stats for 'system wide':
17,981,895 CPU_CLK_UNHALTED.CORE [cpu_atom] # 1.00 Kernel_Utilization
17,925,405 CPU_CLK_UNHALTED.CORE:k [cpu_atom]
1,004,811,366 ns duration_time
41,246,425 CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] # 0.99 Kernel_Utilization
41,819,129 CPU_CLK_UNHALTED.THREAD [cpu_core]
1,004,811,366 ns duration_time
1.004811366 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
0df6ade711 |
perf evlist: Rename cpus to user_requested_cpus
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps of all evsels. For non-task targets, cpus is set to be cpus requested from the command line, defaulting to all online cpus if no cpus are specified. For an uncore event, all_cpus may be just CPU 0 or every online CPU. This causes all_cpus to have fewer values than the cpus variable which is confusing given the 'all' in the name. To try to make the behavior clearer, rename cpus to user_requested_cpus and add comments on the two struct variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
b2b1aa73ad |
perf stat: Fix display of grouped aliased events
An event may have a number of uncore aliases that when added to the
evlist are consecutive.
If there are multiple uncore events in a group then
parse_events__set_leader_for_uncore_aliase will reorder the evlist so
that events on the same PMU are adjacent.
The collect_all_aliases function assumes that aliases are in blocks so
that only the first counter is printed and all others are marked merged.
The reordering for groups breaks the assumption and so all counts are
printed.
This change removes the assumption from collect_all_aliases
that the events are in blocks and instead processes the entire evlist.
Before:
```
$ perf stat -e '{UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE,UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE},duration_time' -a -A -- sleep 1
Performance counter stats for 'system wide':
CPU0 256,866 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 494,413 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 967 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,738 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 285,161 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 429,920 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,443 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 310,753 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 416,657 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,231 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,573 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 416,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 405,966 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,481 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,447 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 312,911 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 408,154 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,086 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,380 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 333,994 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 370,349 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,287 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,335 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 188,107 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 302,423 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 701 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,070 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 307,221 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 383,642 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,036 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,158 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 318,479 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 821,545 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,028 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,550 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 227,618 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 372,272 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 903 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 376,783 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 419,827 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,406 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,453 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 286,583 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 429,956 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 999 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,436 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 313,867 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 370,159 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,114 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 409,111 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,399 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,684 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 365,828 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 376,037 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,378 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,411 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 382,456 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 621,743 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,232 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,316 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 385,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,176 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,268 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 373,588 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 386,163 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,394 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,464 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 381,206 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 546,891 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,266 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,712 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 221,176 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 392,069 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 831 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 355,401 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 705,595 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,235 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,216 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 371,436 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 428,103 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,306 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,442 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 384,352 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 504,200 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,468 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,860 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 228,856 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 287,976 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 832 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,060 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 215,121 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 334,162 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 681 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,026 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 296,179 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 436,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,084 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,525 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 262,296 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 416,573 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 986 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,533 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 285,852 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 359,842 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,073 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,326 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 303,379 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 367,222 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,008 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,156 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 273,487 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 425,449 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 932 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,367 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 297,596 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 414,793 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,140 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,601 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,365 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 360,422 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,342 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 327,196 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 580,858 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,122 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,014 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 296,564 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 452,817 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,087 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,694 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 375,002 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 389,393 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,478 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,540 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 365,213 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 594,685 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,401 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,222 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 1,000,749,060 ns duration_time
1.000749060 seconds time elapsed
```
After:
```
Performance counter stats for 'system wide':
CPU0 20,547,434 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 45,202,862 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 82,001 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 159,688 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 1,000,464,828 ns duration_time
1.000464828 seconds time elapsed
```
Fixes:
|
||
|
|
6d18804b96 |
perf cpumap: Give CPUs their own type
A common problem is confusing CPU map indices with the CPU, by wrapping the CPU with a struct then this is avoided. This approach is similar to atomic_t. Committer notes: To make it build with BUILD_BPF_SKEL=1 these files needed the conversions to 'struct perf_cpu' usage: tools/perf/util/bpf_counter.c tools/perf/util/bpf_counter_cgroup.c tools/perf/util/bpf_ftrace.c Also perf_env__get_cpu() was removed back in "perf cpumap: Switch cpu_map__build_map to cpu function". Additionally these needed to be fixed for the ARM builds to complete: tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/pmu.c Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ce37ab3eb2 |
perf stat: Correct first_shadow_cpu to return index
perf_stat__update_shadow_stats() and perf_stat__print_shadow_stats() use a cpu map index rather than a CPU, but first_shadow_cpu is returning the wrong value for this. Change first_shadow_cpu to first_shadow_cpu_map_idx to make things agree. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-48-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
7ea82fbee4 |
perf stat: Use perf_cpu_map__for_each_cpu()
Correct in print_counter() where an index was being used as a cpu. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-32-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ab90caa7b2 |
perf stat: Rename aggr_data cpu to imply it's an index
Trying to make cpu maps less error prone. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-31-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
7365f105e3 |
perf stat-display: Avoid use of core for CPU
Correct use of cpumap index in print_no_aggr_metric(). Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-26-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
51b826fadf |
perf cpumap: Rename empty functions
Remove cpu_map from name as a cpu_map isn't used. Pass a const pointer rather than by value to avoid unnecessary copying. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-15-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
3ac23d199c |
perf cpumap: Simplify equal function name
Rename cpu_map__compare_aggr_cpu_id() to aggr_cpu_id__equal(), the cpu_map part of the name is misleading. Equal better describes the function than compare. Switch to const pointer rather than value as struct given the number of variables in aggr_cpu_id(). Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-14-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
88031a0de7 |
perf stat: Switch to cpu version of cpu_map__get()
Avoid possible bugs where the wrong index is passed with the cpu_map. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
a023283fad |
perf stat: Switch aggregation to use for_each loop
Tidy up the use of cpu and index to hopefully make the code less error prone. Avoid unused warnings with (void) which will be removed in a later patch. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Vineet Singh <vineet.singh@intel.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: zhengjun.xing@intel.com Link: https://lore.kernel.org/r/20220105061351.120843-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
01843ca019 |
perf stat: Correct aggregation CPU map
Switch the perf_cpu_map in aggr_update_shadow from
the evlist to the counter's cpu map, so the index is appropriate. This
addresses a problem where uncore counts, with a cpumap like:
$ cat /sys/devices/uncore_imc_0/cpumask
0,18
Don't aggregate counts in CPUs based on the index of those values in the
cpumap (0 and 1) but on the actual CPU (0 and 18). Thereby correcting
metric calculations in per-socket mode for counters without a full
cpumask.
On a SkylakeX with a tweaked DRAM_BW_Use metric, to remove unnecessary
scaling, this gives:
Before:
$ /perf stat --per-socket -M DRAM_BW_Use -I 1000
1.001102293 S0 1 27.01 MiB uncore_imc/cas_count_write/ # 103.00 DRAM_BW_Use
1.001102293 S0 1 30.22 MiB uncore_imc/cas_count_read/
1.001102293 S0 1 1,001,102,293 ns duration_time
1.001102293 S1 1 20.10 MiB uncore_imc/cas_count_write/ # 0.00 DRAM_BW_Use
1.001102293 S1 1 32.74 MiB uncore_imc/cas_count_read/
1.001102293 S1 0 <not counted> ns duration_time
2.003517973 S0 1 83.04 MiB uncore_imc/cas_count_write/ # 920.00 DRAM_BW_Use
2.003517973 S0 1 145.95 MiB uncore_imc/cas_count_read/
2.003517973 S0 1 1,002,415,680 ns duration_time
2.003517973 S1 1 302.45 MiB uncore_imc/cas_count_write/ # 0.00 DRAM_BW_Use
2.003517973 S1 1 290.99 MiB uncore_imc/cas_count_read/
2.003517973 S1 0 <not counted> ns duration_time
After:
$ perf stat --per-socket -M DRAM_BW_Use -I 1000
1.001080840 S0 1 24.96 MiB uncore_imc/cas_count_write/ # 54.00 DRAM_BW_Use
1.001080840 S0 1 33.64 MiB uncore_imc/cas_count_read/
1.001080840 S0 1 1,001,080,840 ns duration_time
1.001080840 S1 1 42.43 MiB uncore_imc/cas_count_write/ # 84.00 DRAM_BW_Use
1.001080840 S1 1 47.05 MiB uncore_imc/cas_count_read/
1.001080840 S1 0 <not counted> ns duration_time
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
e0a7ef2a62 |
perf stat: Merge uncore events by default for hybrid platform
On a hybrid platform, by default 'perf stat' aggregates and reports the
event counts per PMU. For example,
# perf stat -e cycles -a true
Performance counter stats for 'system wide':
1,400,445 cpu_core/cycles/
680,881 cpu_atom/cycles/
0.001770773 seconds time elapsed
But for uncore events that's not a suitable method. Uncore has nothing
to do with hybrid. So for uncore events, we aggregate event counts from
all PMUs and report the counts without PMUs.
Before:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
2,058 uncore_arb_0/event=0x81,umask=0x1/
2,028 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000614498 seconds time elapsed
After:
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ -a true
Performance counter stats for 'system wide':
3,996 arb/event=0x81,umask=0x1/
0 arb/event=0x84,umask=0x1/
0.000630046 seconds time elapsed
Of course, we also keep the '--no-merge' working for uncore events.
# perf stat -e arb/event=0x81,umask=0x1/,arb/event=0x84,umask=0x1/ --no-merge true
Performance counter stats for 'system wide':
1,952 uncore_arb_0/event=0x81,umask=0x1/
1,921 uncore_arb_1/event=0x81,umask=0x1/
0 uncore_arb_0/event=0x84,umask=0x1/
0 uncore_arb_1/event=0x84,umask=0x1/
0.000575536 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210707055652.962-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
493be70ac3 |
perf stat: Disable the NMI watchdog message on hybrid
If we run a single workload that only runs on big core, there is always
a ugly message about disabling the NMI watchdog because the atom is not
counted.
Before:
# ./perf stat true
Performance counter stats for 'true':
0.43 msec task-clock # 0.396 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
45 page-faults # 103.918 K/sec
639,634 cpu_core/cycles/ # 1.477 G/sec
<not counted> cpu_atom/cycles/ (0.00%)
643,498 cpu_core/instructions/ # 1.486 G/sec
<not counted> cpu_atom/instructions/ (0.00%)
123,715 cpu_core/branches/ # 285.694 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,094 cpu_core/branch-misses/ # 9.454 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001092407 seconds time elapsed
0.001144000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.001904106 seconds time elapsed
0.001947000 seconds user
0.000000000 seconds sys
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
The events in group usually have to be from the same PMU. Try reorganizing the group.
Now we disable the NMI watchdog message on hybrid, otherwise there
are too many false positives.
After:
# ./perf stat true
Performance counter stats for 'true':
0.79 msec task-clock # 0.419 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
48 page-faults # 60.889 K/sec
777,692 cpu_core/cycles/ # 986.519 M/sec
<not counted> cpu_atom/cycles/ (0.00%)
669,147 cpu_core/instructions/ # 848.828 M/sec
<not counted> cpu_atom/instructions/ (0.00%)
128,635 cpu_core/branches/ # 163.176 M/sec
<not counted> cpu_atom/branches/ (0.00%)
4,089 cpu_core/branch-misses/ # 5.187 M/sec
<not counted> cpu_atom/branch-misses/ (0.00%)
0.001880649 seconds time elapsed
0.001935000 seconds user
0.000000000 seconds sys
# ./perf stat -e '{cpu_atom/cycles/,msr/tsc/}' true
Performance counter stats for 'true':
<not counted> cpu_atom/cycles/ (0.00%)
<not counted> msr/tsc/ (0.00%)
0.000963319 seconds time elapsed
0.000999000 seconds user
0.000000000 seconds sys
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210610034557.29766-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
ce09673636 |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes, since perf/urgent is already upstream. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
3cc84399e9 |
perf stat: Honor event config name on --no-merge
If user gave an event name explicitly, it should be displayed in the output as is. But with --no-merge option it adds a pmu name at the end so might confuse users. Actually this is true for hybrid pmus, I think we should do the same for others. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210602212241.2175005-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
07b747f99a |
perf stat: Use aggregated counts directly
The ps->res_stats is for repeated runs, so the interval code should not touch it. Actually the aggregated counts are available in the counter->counts->aggr, so we can (and should) use it directly IMHO. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210423023833.1430520-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
92637cc729 |
perf stat: Filter out unmatched aggregation for hybrid event
perf-stat has supported some aggregation modes, such as --per-core,
--per-socket and etc. While for hybrid event, it may only available
on part of cpus. So for --per-core, we need to filter out the
unavailable cores, for --per-socket, filter out the unavailable
sockets, and so on.
Before:
# perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 2 479,530 cpu_core/cycles/
S0-D0-C4 2 175,007 cpu_core/cycles/
S0-D0-C8 2 166,240 cpu_core/cycles/
S0-D0-C12 2 704,673 cpu_core/cycles/
S0-D0-C16 2 865,835 cpu_core/cycles/
S0-D0-C20 2 2,958,461 cpu_core/cycles/
S0-D0-C24 2 163,988 cpu_core/cycles/
S0-D0-C28 2 164,729 cpu_core/cycles/
S0-D0-C32 0 <not counted> cpu_core/cycles/
S0-D0-C33 0 <not counted> cpu_core/cycles/
S0-D0-C34 0 <not counted> cpu_core/cycles/
S0-D0-C35 0 <not counted> cpu_core/cycles/
S0-D0-C36 0 <not counted> cpu_core/cycles/
S0-D0-C37 0 <not counted> cpu_core/cycles/
S0-D0-C38 0 <not counted> cpu_core/cycles/
S0-D0-C39 0 <not counted> cpu_core/cycles/
1.003597211 seconds time elapsed
After:
# perf stat --per-core -e cpu_core/cycles/ -a -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 2 210,428 cpu_core/cycles/
S0-D0-C4 2 444,830 cpu_core/cycles/
S0-D0-C8 2 435,241 cpu_core/cycles/
S0-D0-C12 2 423,976 cpu_core/cycles/
S0-D0-C16 2 859,350 cpu_core/cycles/
S0-D0-C20 2 1,559,589 cpu_core/cycles/
S0-D0-C24 2 163,924 cpu_core/cycles/
S0-D0-C28 2 376,610 cpu_core/cycles/
1.003621290 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Co-developed-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-16-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
12279429d8 |
perf stat: Uniquify hybrid event name
It would be useful to let user know the pmu which the event belongs to. perf-stat has supported '--no-merge' option and it can print the pmu name after the event name, such as: "cycles [cpu_core]" Now this option is enabled by default for hybrid platform but change the format to: "cpu_core/cycles/" If user configs the name, we still use the user specified name. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> ink: https://lore.kernel.org/r/20210427070139.25256-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
f07952b179 |
perf stat: Basic support for iostat in perf
Add basic flow for a new iostat mode in perf. Mode is intended to provide four I/O performance metrics per each PCIe root port: Inbound Read, Inbound Write, Outbound Read, Outbound Write. The actual code to compute the metrics and attribute it to root port is in follow-on patches. Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey V Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210419094147.15909-2-alexander.antonov@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
0bdad97801 |
perf stat: Align CSV output for summary mode
The 'perf stat' subcommand supports the request for a summary of the
interval counter readings. But the summary lines break the CSV output
so it's hard for scripts to parse the result.
Before:
# perf stat -x, -I1000 --interval-count 1 --summary
1.001323097,8013.48,msec,cpu-clock,8013483384,100.00,8.013,CPUs utilized
1.001323097,270,,context-switches,8013513297,100.00,0.034,K/sec
1.001323097,13,,cpu-migrations,8013530032,100.00,0.002,K/sec
1.001323097,184,,page-faults,8013546992,100.00,0.023,K/sec
1.001323097,20574191,,cycles,8013551506,100.00,0.003,GHz
1.001323097,10562267,,instructions,8013564958,100.00,0.51,insn per cycle
1.001323097,2019244,,branches,8013575673,100.00,0.252,M/sec
1.001323097,106152,,branch-misses,8013585776,100.00,5.26,of all branches
8013.48,msec,cpu-clock,8013483384,100.00,7.984,CPUs utilized
270,,context-switches,8013513297,100.00,0.034,K/sec
13,,cpu-migrations,8013530032,100.00,0.002,K/sec
184,,page-faults,8013546992,100.00,0.023,K/sec
20574191,,cycles,8013551506,100.00,0.003,GHz
10562267,,instructions,8013564958,100.00,0.51,insn per cycle
2019244,,branches,8013575673,100.00,0.252,M/sec
106152,,branch-misses,8013585776,100.00,5.26,of all branches
The summary line loses the timestamp column, which breaks the CSV
output.
We add a column at the original 'timestamp' position and it just says
'summary' for the summary line.
After:
# perf stat -x, -I1000 --interval-count 1 --summary
1.001196053,8012.72,msec,cpu-clock,8012722903,100.00,8.013,CPUs utilized
1.001196053,218,,context-switches,8012753271,100.00,0.027,K/sec
1.001196053,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
1.001196053,0,,page-faults,8012786257,100.00,0.000,K/sec
1.001196053,15004518,,cycles,8012790637,100.00,0.002,GHz
1.001196053,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
1.001196053,1590259,,branches,8012814766,100.00,0.198,M/sec
1.001196053,82601,,branch-misses,8012824365,100.00,5.19,of all branches
summary,8012.72,msec,cpu-clock,8012722903,100.00,7.986,CPUs utilized
summary,218,,context-switches,8012753271,100.00,0.027,K/sec
summary,9,,cpu-migrations,8012769767,100.00,0.001,K/sec
summary,0,,page-faults,8012786257,100.00,0.000,K/sec
summary,15004518,,cycles,8012790637,100.00,0.002,GHz
summary,7954691,,instructions,8012804027,100.00,0.53,insn per cycle
summary,1590259,,branches,8012814766,100.00,0.198,M/sec
summary,82601,,branch-misses,8012824365,100.00,5.19,of all branches
Now it's easy for script to analyse the summary lines.
Of course, we also consider not to break possible existing scripts which
can continue to use the broken CSV format by using a new '--no-csv-summary.'
option.
# perf stat -x, -I1000 --interval-count 1 --summary --no-csv-summary
1.001213261,8012.67,msec,cpu-clock,8012672327,100.00,8.013,CPUs utilized
1.001213261,197,,context-switches,8012703742,100.00,24.586,/sec
1.001213261,9,,cpu-migrations,8012720902,100.00,1.123,/sec
1.001213261,644,,page-faults,8012738266,100.00,80.373,/sec
1.001213261,18350698,,cycles,8012744109,100.00,0.002,GHz
1.001213261,12745021,,instructions,8012759001,100.00,0.69,insn per cycle
1.001213261,2458033,,branches,8012770864,100.00,306.768,K/sec
1.001213261,102107,,branch-misses,8012781751,100.00,4.15,of all branches
8012.67,msec,cpu-clock,8012672327,100.00,7.985,CPUs utilized
197,,context-switches,8012703742,100.00,24.586,/sec
9,,cpu-migrations,8012720902,100.00,1.123,/sec
644,,page-faults,8012738266,100.00,80.373,/sec
18350698,,cycles,8012744109,100.00,0.002,GHz
12745021,,instructions,8012759001,100.00,0.69,insn per cycle
2458033,,branches,8012770864,100.00,306.768,K/sec
102107,,branch-misses,8012781751,100.00,4.15,of all branches
This option can be enabled in perf config by setting the variable
'stat.no-csv-summary'.
# perf config stat.no-csv-summary=true
# perf config -l
stat.no-csv-summary=true
# perf stat -x, -I1000 --interval-count 1 --summary
1.001330198,8013.28,msec,cpu-clock,8013279201,100.00,8.013,CPUs utilized
1.001330198,205,,context-switches,8013308394,100.00,25.583,/sec
1.001330198,10,,cpu-migrations,8013324681,100.00,1.248,/sec
1.001330198,0,,page-faults,8013340926,100.00,0.000,/sec
1.001330198,8027742,,cycles,8013344503,100.00,0.001,GHz
1.001330198,2871717,,instructions,8013356501,100.00,0.36,insn per cycle
1.001330198,553564,,branches,8013366204,100.00,69.081,K/sec
1.001330198,54021,,branch-misses,8013375952,100.00,9.76,of all branches
8013.28,msec,cpu-clock,8013279201,100.00,7.985,CPUs utilized
205,,context-switches,8013308394,100.00,25.583,/sec
10,,cpu-migrations,8013324681,100.00,1.248,/sec
0,,page-faults,8013340926,100.00,0.000,/sec
8027742,,cycles,8013344503,100.00,0.001,GHz
2871717,,instructions,8013356501,100.00,0.36,insn per cycle
553564,,branches,8013366204,100.00,69.081,K/sec
54021,,branch-misses,8013375952,100.00,9.76,of all branches
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210319070156.20394-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
ded2e511a8 |
perf tools: Cast (struct timeval).tv_sec when printing
The musl-libc [1] defines (struct timeval).tv_sec as a 'long long' for arm and other architectures. The default build having a '-Wformat' flag, not casting the field when printing prevents from building perf. This patch casts the (struct timeval).tv_sec fields to the expected format. [1] git://git.musl-libc.org/musl Signed-off-by: Pierre Gondois <Pierre.Gondois@arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Douglas.raillard@arm.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210224182410.5366-1-Pierre.Gondois@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
fa853c4b83 |
perf stat: Enable counting events for BPF programs
Introduce 'perf stat -b' option, which counts events for BPF programs, like:
[root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
1.487903822 115,200 ref-cycles
1.487903822 86,012 cycles
2.489147029 80,560 ref-cycles
2.489147029 73,784 cycles
3.490341825 60,720 ref-cycles
3.490341825 37,797 cycles
4.491540887 37,120 ref-cycles
4.491540887 31,963 cycles
The example above counts 'cycles' and 'ref-cycles' of BPF program of id
254. This is similar to bpftool-prog-profile command, but more
flexible.
'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.
A new 'struct bpf_counter' is introduced to provide a common interface
that uses BPF programs/maps to count perf events.
Committer notes:
Removed all but bpf_counter.h includes from evsel.h, not needed at all.
Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
buffer passed to the kernel libbpf_num_possible_cpus() entries, not
evsel__nr_cpus(evsel), as the former uses
/sys/devices/system/cpu/possible while the later uses
/sys/devices/system/cpu/online, which may be less than the 'possible'
number making the bpf map lookup overwrite memory and cause hard to
debug memory corruption.
We need to continue using evsel__nr_cpus(evsel) when accessing the
perf_counts array tho, not to overwrite another are of memory :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
8d4852b468 |
perf stat aggregation: Add separate thread member
A separate field isn't strictly required. The core field could be re-used for thread IDs as a single field was used previously. But separating them will avoid confusion and catch potential errors where core IDs are read as thread IDs and vice versa. Also remove the placeholder id field which is now no longer used. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-13-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
b993381779 |
perf stat aggregation: Add separate core member
Add core as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-12-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ba2ee166d9 |
perf stat aggregation: Add separate die member
Add die as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-11-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
1a270cb6b3 |
perf stat aggregation: Add separate socket member
Add socket as a separate member so that it doesn't have to be packed into the int value. When the socket ID was larger than 8 bits the output appeared corrupted or incomplete. For example, here on ThunderX2 'perf stat' reports a socket of -1 and an invalid die number: ./perf stat -a --per-die The socket id number is too big. Performance counter stats for 'system wide': S-1-D255 128 687.99 msec cpu-clock # 57.240 CPUs utilized ... S36-D0 128 842.34 msec cpu-clock # 70.081 CPUs utilized ... And with --per-core there is an entry with an invalid core ID: ./perf stat record -a --per-core The socket id number is too big. Performance counter stats for 'system wide': S-1-D255-C65535 128 671.04 msec cpu-clock # 54.112 CPUs utilized ... S36-D0-C0 4 28.27 msec cpu-clock # 2.279 CPUs utilized ... This fixes the "Session topology" self test on ThunderX2. After this fix the output contains the correct socket and die IDs and no longer prints a warning about the size of the socket ID: ./perf stat --per-die -a Performance counter stats for 'system wide': S36-D0 128 169,869.39 msec cpu-clock # 127.501 CPUs utilized ... S3612-D0 128 169,733.05 msec cpu-clock # 127.398 CPUs utilized Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-10-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
fcd83a35dd |
perf stat aggregation: Add separate node member
Add node as a separate member so that it doesn't have to be packed into the int value. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-9-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
ff5232956e |
perf stat aggregation: Start using cpu_aggr_id in map
Use the new cpu_aggr_id struct in the cpu map instead of int so that it can store more data. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-8-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
2760f5a14f |
perf stat: Replace aggregation ID with a struct
Replace all occurences of the usage of int with the new struct cpu_aggr_id. No functional changes. Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20201126141328.6509-5-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
7127372419 |
perf evlist: Use the right prefix for 'struct evlist' print methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/, go on completing this split. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
c0ee1d5ae8 |
perf stat: Use proper cpu for shadow stats
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.
Before:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle
1.001816971 seconds time elapsed
So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
After:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle
1.001910444 seconds time elapsed
Fixes:
|
||
|
|
7a16183316 |
perf stat: Remove dead code: no need to set os.evsel twice
No need to set os.evsel twice. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200910032632.511566-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
313146a844 |
perf stat: Fix out of bounds array access in the print_counters() evlist method
Fix a compile error on F32 and gcc version 10.1 on s390 in file
utils/stat-display.c. The error does not show up with make DEBUG=y. In
fact the issue shows up when using both compiler options -O6 and
-D_FORTIFY_SOURCE=2 (which are omitted with DEBUG=Y).
This is the offending call chain:
print_counter_aggr()
printout(config, -1, 0, ...) with 2nd parm id set to -1
aggr_printout(config, x, id --> -1, ...) which leads to this code:
case AGGR_NONE:
if (evsel->percore && !config->percore_show_thread) {
....
} else {
fprintf(config->output, "CPU%*d%s",
config->csv_output ? 0 : -7,
evsel__cpus(evsel)->map[id],
^^ id is -1 !!!!
config->csv_sep);
}
This is a compiler inlining issue which is detected on s390 but not on
other plattforms.
Output before:
# make util/stat-display.o
.....
util/stat-display.c: In function ‘perf_evlist__print_counters’:
util/stat-display.c:121:4: error: array subscript -1 is below array
bounds of ‘int[]’ [-Werror=array-bounds]
121 | fprintf(config->output, "CPU%*d%s",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122 | config->csv_output ? 0 : -7,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123 | evsel__cpus(evsel)->map[id],
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
124 | config->csv_sep);
| ~~~~~~~~~~~~~~~~
In file included from util/evsel.h:13,
from util/evlist.h:13,
from util/stat-display.c:9:
/root/linux/tools/lib/perf/include/internal/cpumap.h:10:7:
note: while referencing ‘map’
10 | int map[];
| ^~~
cc1: all warnings being treated as errors
mv: cannot stat 'util/.stat-display.o.tmp': No such file or directory
make[3]: *** [/root/linux/tools/build/Makefile.build:97: util/stat-display.o]
Error 1
make[2]: *** [Makefile.perf:716: util/stat-display.o] Error 2
make[1]: *** [Makefile.perf:231: sub-make] Error 2
make: *** [Makefile:110: util/stat-display.o] Error 2
[root@t35lp46 perf]#
Output after:
# make util/stat-display.o
.....
CC util/stat-display.o
[root@t35lp46 perf]#
Committer notes:
Removed the removal of {} enclosing the multiline else block, as pointed
out by Jiri Olsa.
Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200825063304.77733-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
||
|
|
c0c652fc70 |
perf stat: Fix NULL pointer dereference
If config->aggr_map is NULL and config->aggr_get_id is not NULL,
the function print_aggr() will still calling arrg_update_shadow(),
which can result in accessing the invalid pointer.
Fixes:
|
||
|
|
c754c382c9 |
perf evsel: Rename perf_evsel__is_*() to evsel__is*()
As those are 'struct evsel' methods, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
8ab2e96d8f |
perf evsel: Rename *perf_evsel__*name() to *evsel__*name()
As they are 'struct evsel' methods or related routines, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
5eb88f0476 |
perf evsel: Rename perf_evsel__nr_cpus() to evsel__nr_cpus()
As it is a 'struct evsel' method, not part of tools/lib/perf/, aka libperf, to whom the perf_ prefix belongs. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
|
3351c6da89 |
perf tools: Enable Hz/hz prinitg for --metric-only option
Commit
|