2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
linux/kernel/sched
Chen Yu db6cc3f4ac Revert "sched/numa: add statistics of numa balance task"
This reverts commit ad6b26b6a0.

This commit introduces per-memcg/task NUMA balance statistics, but
unfortunately it introduced a NULL pointer exception due to the following
race condition: After a swap task candidate was chosen, its mm_struct
pointer was set to NULL due to task exit.  Later, when performing the
actual task swapping, the p->mm caused the problem.

CPU0                                   CPU1
:
...
task_numa_migrate
     task_numa_find_cpu
      task_numa_compare
        # a normal task p is chosen
        env->best_task = p

                                          # p exit:
                                          exit_signals(p);
                                             p->flags |= PF_EXITING
                                          exit_mm
                                             p->mm = NULL;

      migrate_swap_stop
        __migrate_swap_task((arg->src_task, arg->dst_cpu)
         count_memcg_event_mm(p->mm, NUMA_TASK_SWAP)# p->mm is NULL

task_lock() should be held and the PF_EXITING flag needs to be checked to
prevent this from happening.  After discussion, the conclusion was that
adding a lock is not worthwhile for some statistics calculations.  Revert
the change and rely on the tracepoint for this purpose.

Link: https://lkml.kernel.org/r/20250704135620.685752-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/20250708064917.BBD13C4CEED@smtp.kernel.org
Fixes: ad6b26b6a0 ("sched/numa: add statistics of numa balance task")
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reported-by: Jirka Hladky <jhladky@redhat.com>
Closes: https://lore.kernel.org/all/CAE4VaGBLJxpd=NeRJXpSCuw=REhC5LWJpC29kDy-Zh2ZDyzQZA@mail.gmail.com/
Reported-by: Srikanth Aithal <Srikanth.Aithal@amd.com>
Reported-by: Suneeth D <Suneeth.D@amd.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Hladky <jhladky@redhat.com>
Cc: Libo Chen <libo.chen@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 21:07:56 -07:00
..
autogroup.c sched_ext: Fixes for v6.14-rc2 2025-02-14 11:14:24 -08:00
autogroup.h
build_policy.c sched_ext: Move built-in idle CPU selection policy to a separate file 2025-01-27 12:43:43 -10:00
build_utility.c sched/debug: Make CONFIG_SCHED_DEBUG functionality unconditional 2025-03-19 22:20:53 +01:00
clock.c
completion.c
core_sched.c sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() 2025-03-19 22:20:53 +01:00
core.c Revert "sched/numa: add statistics of numa balance task" 2025-07-09 21:07:56 -07:00
cpuacct.c
cpudeadline.c
cpudeadline.h
cpufreq_schedutil.c cpufreq/sched: schedutil: Add helper for governor checks 2025-05-07 21:17:56 +02:00
cpufreq.c
cpupri.c
cpupri.h
cputime.c sched/clock: Don't define sched_clock_irqtime as static key 2025-03-10 14:22:58 +01:00
deadline.c sched/deadline: Fix dl_server runtime calculation formula 2025-07-04 10:35:56 +02:00
debug.c Revert "sched/numa: add statistics of numa balance task" 2025-07-09 21:07:56 -07:00
ext_idle.c sched_ext: idle: Skip cross-node search with !CONFIG_NUMA 2025-06-03 08:22:27 -10:00
ext_idle.h sched_ext: idle: Explicitly pass allowed cpumask to scx_select_cpu_dfl() 2025-04-07 07:13:52 -10:00
ext.c sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group() 2025-06-17 08:19:55 -10:00
ext.h sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group() 2025-06-17 08:19:55 -10:00
fair.c - The 2 patch series "zram: support algorithm-specific parameters" from 2025-06-02 16:00:26 -07:00
features.h sched/fair: Untangle NEXT_BUDDY and pick_next_task() 2024-12-09 11:48:13 +01:00
idle.c sched_ext: idle: Refresh idle masks during idle-to-idle transitions 2025-01-10 12:40:42 -10:00
isolation.c sched/isolation: Make use of more than one housekeeping cpu 2025-04-08 20:55:55 +02:00
loadavg.c
Makefile tracing: Disable branch profiling in noinstr code 2025-03-22 09:49:26 +01:00
membarrier.c
pelt.c sched/fair: Use the new cfs_rq.h_nr_runnable 2024-12-09 11:48:11 +01:00
pelt.h sched: Move update_other_load_avgs() to kernel/sched/pelt.c 2024-09-11 20:00:21 -10:00
psi.c treewide, timers: Rename from_timer() to timer_container_of() 2025-06-08 09:07:37 +02:00
rt.c sched/rt: Fix race in push_rt_task 2025-04-08 20:55:55 +02:00
sched-pelt.h
sched.h sched_ext: Changes for v6.16 2025-05-27 21:12:50 -07:00
smp.h
stats.c docs: Update Schedstat version to 17 2024-12-20 15:31:18 +01:00
stats.h sched/debug: Change SCHED_WARN_ON() to WARN_ON_ONCE() 2025-03-19 22:20:53 +01:00
stop_task.c
swait.c
syscalls.c sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED 2025-04-08 20:55:54 +02:00
topology.c Power management updates for 6.16-rc1 2025-05-27 16:48:47 -07:00
wait_bit.c sched/wait: Remove unused bit_wait_io_timeout 2024-10-07 09:28:41 +02:00
wait.c