mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	 2a09a84c72
			
		
	
	
		2a09a84c72
		
	
	
	
	
		
			
			This patch enables perf-diff with "--stream" option.
"--stream": Enable hot streams comparison
Now let's see example.
perf record -b ...      Generate perf.data.old with branch data
perf record -b ...      Generate perf.data with branch data
perf diff --stream
[ Matched hot streams ]
hot chain pair 1:
            cycles: 1, hits: 27.77%                  cycles: 1, hits: 9.24%
        ---------------------------              --------------------------
                      main div.c:39                           main div.c:39
                      main div.c:44                           main div.c:44
hot chain pair 2:
           cycles: 34, hits: 20.06%                cycles: 27, hits: 16.98%
        ---------------------------              --------------------------
          __random_r random_r.c:360               __random_r random_r.c:360
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:380               __random_r random_r.c:380
          __random_r random_r.c:357               __random_r random_r.c:357
              __random random.c:293                   __random random.c:293
              __random random.c:293                   __random random.c:293
              __random random.c:291                   __random random.c:291
              __random random.c:291                   __random random.c:291
              __random random.c:291                   __random random.c:291
              __random random.c:288                   __random random.c:288
                     rand rand.c:27                          rand rand.c:27
                     rand rand.c:26                          rand rand.c:26
                           rand@plt                                rand@plt
                           rand@plt                                rand@plt
              compute_flag div.c:25                   compute_flag div.c:25
              compute_flag div.c:22                   compute_flag div.c:22
                      main div.c:40                           main div.c:40
                      main div.c:40                           main div.c:40
                      main div.c:39                           main div.c:39
hot chain pair 3:
             cycles: 9, hits: 4.48%                  cycles: 6, hits: 4.51%
        ---------------------------              --------------------------
          __random_r random_r.c:360               __random_r random_r.c:360
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:388               __random_r random_r.c:388
          __random_r random_r.c:380               __random_r random_r.c:380
[ Hot streams in old perf data only ]
hot chain 1:
            cycles: 18, hits: 6.75%
         --------------------------
          __random_r random_r.c:360
          __random_r random_r.c:388
          __random_r random_r.c:388
          __random_r random_r.c:380
          __random_r random_r.c:357
              __random random.c:293
              __random random.c:293
              __random random.c:291
              __random random.c:291
              __random random.c:291
              __random random.c:288
                     rand rand.c:27
                     rand rand.c:26
                           rand@plt
                           rand@plt
              compute_flag div.c:25
              compute_flag div.c:22
                      main div.c:40
hot chain 2:
            cycles: 29, hits: 2.78%
         --------------------------
              compute_flag div.c:22
                      main div.c:40
                      main div.c:40
                      main div.c:39
[ Hot streams in new perf data only ]
hot chain 1:
                                                     cycles: 4, hits: 4.54%
                                                 --------------------------
                                                              main div.c:42
                                                      compute_flag div.c:28
hot chain 2:
                                                     cycles: 5, hits: 3.51%
                                                 --------------------------
                                                              main div.c:39
                                                              main div.c:44
                                                              main div.c:42
                                                      compute_flag div.c:28
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
		
	
			
		
			
				
	
	
		
			306 lines
		
	
	
		
			9.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			306 lines
		
	
	
		
			9.1 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| perf-diff(1)
 | |
| ============
 | |
| 
 | |
| NAME
 | |
| ----
 | |
| perf-diff - Read perf.data files and display the differential profile
 | |
| 
 | |
| SYNOPSIS
 | |
| --------
 | |
| [verse]
 | |
| 'perf diff' [baseline file] [data file1] [[data file2] ... ]
 | |
| 
 | |
| DESCRIPTION
 | |
| -----------
 | |
| This command displays the performance difference amongst two or more perf.data
 | |
| files captured via perf record.
 | |
| 
 | |
| If no parameters are passed it will assume perf.data.old and perf.data.
 | |
| 
 | |
| The differential profile is displayed only for events matching both
 | |
| specified perf.data files.
 | |
| 
 | |
| If no parameters are passed the samples will be sorted by dso and symbol.
 | |
| As the perf.data files could come from different binaries, the symbols addresses
 | |
| could vary. So perf diff is based on the comparison of the files and
 | |
| symbols name.
 | |
| 
 | |
| OPTIONS
 | |
| -------
 | |
| -D::
 | |
| --dump-raw-trace::
 | |
|         Dump raw trace in ASCII.
 | |
| 
 | |
| --kallsyms=<file>::
 | |
|         kallsyms pathname
 | |
| 
 | |
| -m::
 | |
| --modules::
 | |
|         Load module symbols. WARNING: use only with -k and LIVE kernel
 | |
| 
 | |
| -d::
 | |
| --dsos=::
 | |
| 	Only consider symbols in these dsos. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage
 | |
| 	of the Baseline/Delta column.  See --percentage for more info.
 | |
| 
 | |
| -C::
 | |
| --comms=::
 | |
| 	Only consider symbols in these comms. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage
 | |
| 	of the Baseline/Delta column.  See --percentage for more info.
 | |
| 
 | |
| -S::
 | |
| --symbols=::
 | |
| 	Only consider these symbols. CSV that understands
 | |
| 	file://filename entries.  This option will affect the percentage
 | |
| 	of the Baseline/Delta column.  See --percentage for more info.
 | |
| 
 | |
| -s::
 | |
| --sort=::
 | |
| 	Sort by key(s): pid, comm, dso, symbol, cpu, parent, srcline.
 | |
| 	Please see description of --sort in the perf-report man page.
 | |
| 
 | |
| -t::
 | |
| --field-separator=::
 | |
| 
 | |
| 	Use a special separator character and don't pad with spaces, replacing
 | |
| 	all occurrences of this separator in symbol names (and other output)
 | |
| 	with a '.' character, that thus it's the only non valid separator.
 | |
| 
 | |
| -v::
 | |
| --verbose::
 | |
| 	Be verbose, for instance, show the raw counts in addition to the
 | |
| 	diff.
 | |
| 
 | |
| -q::
 | |
| --quiet::
 | |
| 	Do not show any message.  (Suppress -v)
 | |
| 
 | |
| -f::
 | |
| --force::
 | |
|         Don't do ownership validation.
 | |
| 
 | |
| --symfs=<directory>::
 | |
|         Look for files with symbols relative to this directory.
 | |
| 
 | |
| -b::
 | |
| --baseline-only::
 | |
|         Show only items with match in baseline.
 | |
| 
 | |
| -c::
 | |
| --compute::
 | |
|         Differential computation selection - delta, ratio, wdiff, cycles,
 | |
|         delta-abs (default is delta-abs).  Default can be changed using
 | |
|         diff.compute config option.  See COMPARISON METHODS section for
 | |
|         more info.
 | |
| 
 | |
| --cycles-hist::
 | |
| 	Report a histogram and the standard deviation for cycles data.
 | |
| 	It can help us to judge if the reported cycles data is noisy or
 | |
| 	not. This option should be used with '-c cycles'.
 | |
| 
 | |
| -p::
 | |
| --period::
 | |
|         Show period values for both compared hist entries.
 | |
| 
 | |
| -F::
 | |
| --formula::
 | |
|         Show formula for given computation.
 | |
| 
 | |
| -o::
 | |
| --order::
 | |
|        Specify compute sorting column number.  0 means sorting by baseline
 | |
|        overhead and 1 (default) means sorting by computed value of column 1
 | |
|        (data from the first file other base baseline).  Values more than 1
 | |
|        can be used only if enough data files are provided.
 | |
|        The default value can be set using the diff.order config option.
 | |
| 
 | |
| --percentage::
 | |
| 	Determine how to display the overhead percentage of filtered entries.
 | |
| 	Filters can be applied by --comms, --dsos and/or --symbols options.
 | |
| 
 | |
| 	"relative" means it's relative to filtered entries only so that the
 | |
| 	sum of shown entries will be always 100%.  "absolute" means it retains
 | |
| 	the original value before and after the filter is applied.
 | |
| 
 | |
| --time::
 | |
| 	Analyze samples within given time window. It supports time
 | |
| 	percent with multiple time ranges. Time string is 'a%/n,b%/m,...'
 | |
| 	or 'a%-b%,c%-%d,...'.
 | |
| 
 | |
| 	For example:
 | |
| 
 | |
| 	Select the second 10% time slice to diff:
 | |
| 
 | |
| 	  perf diff --time 10%/2
 | |
| 
 | |
| 	Select from 0% to 10% time slice to diff:
 | |
| 
 | |
| 	  perf diff --time 0%-10%
 | |
| 
 | |
| 	Select the first and the second 10% time slices to diff:
 | |
| 
 | |
| 	  perf diff --time 10%/1,10%/2
 | |
| 
 | |
| 	Select from 0% to 10% and 30% to 40% slices to diff:
 | |
| 
 | |
| 	  perf diff --time 0%-10%,30%-40%
 | |
| 
 | |
| 	It also supports analyzing samples within a given time window
 | |
| 	<start>,<stop>. Times have the format seconds.nanoseconds. If 'start'
 | |
| 	is not given (i.e. time string is ',x.y') then analysis starts at
 | |
| 	the beginning of the file. If stop time is not given (i.e. time
 | |
| 	string is 'x.y,') then analysis goes to the end of the file.
 | |
| 	Multiple ranges can be separated by spaces, which requires the argument
 | |
| 	to be quoted e.g. --time "1234.567,1234.789 1235,"
 | |
| 	Time string is'a1.b1,c1.d1:a2.b2,c2.d2'. Use ':' to separate timestamps
 | |
| 	for different perf.data files.
 | |
| 
 | |
| 	For example, we get the timestamp information from 'perf script'.
 | |
| 
 | |
| 	  perf script -i perf.data.old
 | |
| 	    mgen 13940 [000]  3946.361400: ...
 | |
| 
 | |
| 	  perf script -i perf.data
 | |
| 	    mgen 13940 [000]  3971.150589 ...
 | |
| 
 | |
| 	  perf diff --time 3946.361400,:3971.150589,
 | |
| 
 | |
| 	It analyzes the perf.data.old from the timestamp 3946.361400 to
 | |
| 	the end of perf.data.old and analyzes the perf.data from the
 | |
| 	timestamp 3971.150589 to the end of perf.data.
 | |
| 
 | |
| --cpu:: Only diff samples for the list of CPUs provided. Multiple CPUs can
 | |
| 	be provided as a comma-separated list with no space: 0,1. Ranges of
 | |
| 	CPUs are specified with -: 0-2. Default is to report samples on all
 | |
| 	CPUs.
 | |
| 
 | |
| --pid=::
 | |
| 	Only diff samples for given process ID (comma separated list).
 | |
| 
 | |
| --tid=::
 | |
| 	Only diff samples for given thread ID (comma separated list).
 | |
| 
 | |
| --stream::
 | |
| 	Enable hot streams comparison. Stream can be a callchain which is
 | |
| 	aggregated by the branch records from samples.
 | |
| 
 | |
| COMPARISON
 | |
| ----------
 | |
| The comparison is governed by the baseline file. The baseline perf.data
 | |
| file is iterated for samples. All other perf.data files specified on
 | |
| the command line are searched for the baseline sample pair. If the pair
 | |
| is found, specified computation is made and result is displayed.
 | |
| 
 | |
| All samples from non-baseline perf.data files, that do not match any
 | |
| baseline entry, are displayed with empty space within baseline column
 | |
| and possible computation results (delta) in their related column.
 | |
| 
 | |
| Example files samples:
 | |
| - file A with samples f1, f2, f3, f4,    f6
 | |
| - file B with samples     f2,     f4, f5
 | |
| - file C with samples f1, f2,         f5
 | |
| 
 | |
| Example output:
 | |
|   x - computation takes place for pair
 | |
|   b - baseline sample percentage
 | |
| 
 | |
| - perf diff A B C
 | |
| 
 | |
|   baseline/A compute/B compute/C  samples
 | |
|   ---------------------------------------
 | |
|   b                    x          f1
 | |
|   b          x         x          f2
 | |
|   b                               f3
 | |
|   b          x                    f4
 | |
|   b                               f6
 | |
|              x         x          f5
 | |
| 
 | |
| - perf diff B A C
 | |
| 
 | |
|   baseline/B compute/A compute/C  samples
 | |
|   ---------------------------------------
 | |
|   b          x         x          f2
 | |
|   b          x                    f4
 | |
|   b                    x          f5
 | |
|              x         x          f1
 | |
|              x                    f3
 | |
|              x                    f6
 | |
| 
 | |
| - perf diff C B A
 | |
| 
 | |
|   baseline/C compute/B compute/A  samples
 | |
|   ---------------------------------------
 | |
|   b                    x          f1
 | |
|   b          x         x          f2
 | |
|   b          x                    f5
 | |
|                        x          f3
 | |
|              x         x          f4
 | |
|                        x          f6
 | |
| 
 | |
| COMPARISON METHODS
 | |
| ------------------
 | |
| delta
 | |
| ~~~~~
 | |
| If specified the 'Delta' column is displayed with value 'd' computed as:
 | |
| 
 | |
|   d = A->period_percent - B->period_percent
 | |
| 
 | |
| with:
 | |
|   - A/B being matching hist entry from data/baseline file specified
 | |
|     (or perf.data/perf.data.old) respectively.
 | |
| 
 | |
|   - period_percent being the % of the hist entry period value within
 | |
|     single data file
 | |
| 
 | |
|   - with filtering by -C, -d and/or -S, period_percent might be changed
 | |
|     relative to how entries are filtered.  Use --percentage=absolute to
 | |
|     prevent such fluctuation.
 | |
| 
 | |
| delta-abs
 | |
| ~~~~~~~~~
 | |
| Same as 'delta` method, but sort the result with the absolute values.
 | |
| 
 | |
| ratio
 | |
| ~~~~~
 | |
| If specified the 'Ratio' column is displayed with value 'r' computed as:
 | |
| 
 | |
|   r = A->period / B->period
 | |
| 
 | |
| with:
 | |
|   - A/B being matching hist entry from data/baseline file specified
 | |
|     (or perf.data/perf.data.old) respectively.
 | |
| 
 | |
|   - period being the hist entry period value
 | |
| 
 | |
| wdiff:WEIGHT-B,WEIGHT-A
 | |
| ~~~~~~~~~~~~~~~~~~~~~~~
 | |
| If specified the 'Weighted diff' column is displayed with value 'd' computed as:
 | |
| 
 | |
|    d = B->period * WEIGHT-A - A->period * WEIGHT-B
 | |
| 
 | |
|   - A/B being matching hist entry from data/baseline file specified
 | |
|     (or perf.data/perf.data.old) respectively.
 | |
| 
 | |
|   - period being the hist entry period value
 | |
| 
 | |
|   - WEIGHT-A/WEIGHT-B being user supplied weights in the the '-c' option
 | |
|     behind ':' separator like '-c wdiff:1,2'.
 | |
|     - WEIGHT-A being the weight of the data file
 | |
|     - WEIGHT-B being the weight of the baseline data file
 | |
| 
 | |
| cycles
 | |
| ~~~~~~
 | |
| If specified the '[Program Block Range] Cycles Diff' column is displayed.
 | |
| It displays the cycles difference of same program basic block amongst
 | |
| two perf.data. The program basic block is the code between two branches.
 | |
| 
 | |
| '[Program Block Range]' indicates the range of a program basic block.
 | |
| Source line is reported if it can be found otherwise uses symbol+offset
 | |
| instead.
 | |
| 
 | |
| SEE ALSO
 | |
| --------
 | |
| linkperf:perf-record[1], linkperf:perf-report[1]
 |