linux

mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2026-03-22 07:27:12 +08:00

Author	SHA1	Message	Date
Maninder Singh	292ded180b	kasan: remove unnecessary sync argument from start_report() commit `7ce0ea19d5` ("kasan: switch kunit tests to console tracepoints") removed use of sync variable, thus removing that extra argument also. Link: https://lkml.kernel.org/r/20260122041556.341868-1-maninder1.s@samsung.com Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Acked-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:52 -08:00
zenghongling	57fdfd6423	mm/pagewalk: use min() to simplify the code Use the min() macro to simplify the function and improve its readability. [akpm@linux-foundation.org: add newline, per Lorenzo] Link: https://lkml.kernel.org/r/20260120094932.183697-1-zenghongling@kylinos.cn Signed-off-by: zenghongling <zenghongling@kylinos.cn> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Hongling Zeng <zenghongling@kylinos.cn> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:52 -08:00
Lorenzo Stoakes	17fd82c3ab	mm/vma: add and use vma_assert_stabilised() Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot be changed underneath us. This will be the case if EITHER the VMA lock or the mmap lock is held. In order to do so, we introduce a new assert vma_assert_stabilised() - this will make a lockdep assert if lockdep is enabled AND the VMA is read-locked. Currently lockdep tracking for VMA write locks is not implemented, so it suffices to check in this case that we have either an mmap read or write semaphore held. Note that because the VMA lock uses the non-standard vmlock_dep_map naming convention, we cannot use lockdep_assert_is_write_held() so have to open code this ourselves via lockdep-asserting that lock_is_held_type(&vma->vmlock_dep_map, 0). We have to be careful here - for instance when merging a VMA, we use the mmap write lock to stabilise the examination of adjacent VMAs which might be simultaneously VMA read-locked whilst being faulted in. If we were to assert VMA read lock using lockdep we would encounter an incorrect lockdep assert. Also, we have to be careful about asserting mmap locks are held - if we try to address the above issue by first checking whether mmap lock is held and if so asserting it via lockdep, we may find that we were raced by another thread acquiring an mmap read lock simultaneously that either we don't own (and thus can be released any time - so we are not stable) or was indeed released since we last checked. So to deal with these complexities we end up with either a precise (if lockdep is enabled) or imprecise (if not) approach - in the first instance we assert the lock is held using lockdep and thus whether we own it. If we do own it, then the check is complete, otherwise we must check for the VMA read lock being held (VMA write lock implies mmap write lock so the mmap lock suffices for this). If lockdep is not enabled we simply check if the mmap lock is held and risk a false negative (i.e. not asserting when we should do). There are a couple places in the kernel where we already do this stabliisation check - the anon_vma_name() helper in mm/madvise.c and vma_flag_set_atomic() in include/linux/mm.h, which we update to use vma_assert_stabilised(). This change abstracts these into vma_assert_stabilised(), uses lockdep if possible, and avoids a duplicate check of whether the mmap lock is held. This is also self-documenting and lays the foundations for further VMA stability checks in the code. The only functional change here is adding the lockdep check. Link: https://lkml.kernel.org/r/6c9e64bb2b56ddb6f806fde9237f8a00cb3a776b.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:51 -08:00
Lorenzo Stoakes	22f7639f2f	mm/vma: improve and document __is_vma_write_locked() We don't actually need to return an output parameter providing mm sequence number, rather we can separate that out into another function - __vma_raw_mm_seqnum() - and have any callers which need to obtain that invoke that instead. The access to the raw sequence number requires that we hold the exclusive mmap lock such that we know we can't race vma_end_write_all(), so move the assert to __vma_raw_mm_seqnum() to make this requirement clear. Also while we're here, convert all of the VM_BUG_ON_VMA()'s to VM_WARN_ON_ONCE_VMA()'s in line with the convention that we do not invoke oopses when we can avoid it. [lorenzo.stoakes@oracle.com: minor tweaks, per Vlastimil] Link: https://lkml.kernel.org/r/3fa89c13-232d-4eee-86cc-96caa75c2c67@lucifer.local Link: https://lkml.kernel.org/r/ef6c415c2d2c03f529dca124ccaed66bc2f60edc.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:51 -08:00
Lorenzo Stoakes	e28e575af9	mm/vma: introduce helper struct + thread through exclusive lock fns It is confusing to have __vma_start_exclude_readers() return 0, 1 or an error (but only when waiting for readers in TASK_KILLABLE state), and having the return value be stored in a stack variable called 'locked' is further confusion. More generally, we are doing a lot of rather finnicky things during the acquisition of a state in which readers are excluded and moving out of this state, including tracking whether we are detached or not or whether an error occurred. We are implementing logic in __vma_start_exclude_readers() that effectively acts as if 'if one caller calls us do X, if another then do Y', which is very confusing from a control flow perspective. Introducing the shared helper object state helps us avoid this, as we can now handle the 'an error arose but we're detached' condition correctly in both callers - a warning if not detaching, and treating the situation as if no error arose in the case of a VMA detaching. This also acts to help document what's going on and allows us to add some more logical debug asserts. Also update vma_mark_detached() to add a guard clause for the likely 'already detached' state (given we hold the mmap write lock), and add a comment about ephemeral VMA read lock reference count increments to clarify why we are entering/exiting an exclusive locked state here. Finally, separate vma_mark_detached() into its fast-path component and make it inline, then place the slow path for excluding readers in mmap_lock.c. No functional change intended. [akpm@linux-foundation.org: fix function naming in comments, add comment per Vlastimil per Lorenzo] Link: https://lkml.kernel.org/r/7d3084d596c84da10dd374130a5055deba6439c0.1769198904.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/7d3084d596c84da10dd374130a5055deba6439c0.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:51 -08:00
Lorenzo Stoakes	28f590f35d	mm/vma: clean up __vma_enter/exit_locked() These functions are very confusing indeed. 'Entering' a lock could be interpreted as acquiring it, but this is not what these functions are interacting with. Equally they don't indicate at all what kind of lock we are 'entering' or 'exiting'. Finally they are misleading as we invoke these functions when we already hold a write lock to detach a VMA. These functions are explicitly simply 'entering' and 'exiting' a state in which we hold the EXCLUSIVE lock in order that we can either mark the VMA as being write-locked, or mark the VMA detached. Rename the functions accordingly, and also update __vma_end_exclude_readers() to return detached state with a __must_check directive, as it is simply clumsy to pass an output pointer here to detached state and inconsistent vs. __vma_start_exclude_readers(). Finally, remove the unnecessary 'inline' directives. No functional change intended. Link: https://lkml.kernel.org/r/33273be9389712347d69987c408ca7436f0c1b22.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:50 -08:00
Lorenzo Stoakes	e5aeb75dc4	mm/vma: de-duplicate __vma_enter_locked() error path We're doing precisely the same thing that __vma_exit_locked() does, so de-duplicate this code and keep the refcount primitive in one place. No functional change intended. Link: https://lkml.kernel.org/r/c9759b593f6a158e984fa87abe2c3cbd368ef825.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:50 -08:00
Lorenzo Stoakes	1f2e7efc3e	mm/vma: add+use vma lockdep acquire/release defines The code is littered with inscrutable and duplicative lockdep incantations, replace these with defines which explain what is going on and add commentary to explain what we're doing. If lockdep is disabled these become no-ops. We must use defines so _RET_IP_ remains meaningful. These are self-documenting and aid readability of the code. Additionally, instead of using the confusing rwsem_*() form for something that is emphatically not an rwsem, we instead explicitly use lock_[acquired, release]_shared/exclusive() lockdep invocations since we are doing something rather custom here and these make more sense to use. No functional change intended. Link: https://lkml.kernel.org/r/fdae72441949ecf3b4a0ed3510da803e881bb153.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:50 -08:00
Lorenzo Stoakes	180355d4cf	mm/vma: rename is_vma_write_only(), separate out shared refcount put The is_vma_writer_only() function is misnamed - this isn't determining if there is only a write lock, as it checks for the presence of the VM_REFCNT_EXCLUDE_READERS_FLAG. Really, it is checking to see whether readers are excluded, with a possibility of a false positive in the case of a detachment (there we expect the vma->vm_refcnt to eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1). Rename the function accordingly. Relatedly, we use a __refcount_dec_and_test() primitive directly in vma_refcount_put(), using the old value to determine what the reference count ought to be after the operation is complete (ignoring racing reference count adjustments). Wrap this into a __vma_refcount_put_return() function, which we can then utilise in vma_mark_detached() and thus keep the refcount primitive usage abstracted. This function, as the name implies, returns the value after the reference count has been updated. This reduces duplication in the two invocations of this function. Also adjust comments, removing duplicative comments covered elsewhere and adding more to aid understanding. No functional change intended. Link: https://lkml.kernel.org/r/32053580bff460eb1092ef780b526cefeb748bad.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:50 -08:00
Lorenzo Stoakes	ef4c0cea1e	mm/vma: document possible vma->vm_refcnt values and reference comment The possible vma->vm_refcnt values are confusing and vague, explain in detail what these can be in a comment describing the vma->vm_refcnt field and reference this comment in various places that read/write this field. No functional change intended. [akpm@linux-foundation.org: fix typo, per Suren] Link: https://lkml.kernel.org/r/d462e7678c6cc7461f94e5b26c776547d80a67e8.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:49 -08:00
Lorenzo Stoakes	25faccd699	mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Patch series "mm: add and use vma_assert_stabilised() helper", v4. This series first introduces a series of refactorings, intended to significantly improve readability and abstraction of the code. Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot be changed underneath us. This will be the case if EITHER the VMA lock or the mmap lock is held. We already open-code this in two places - anon_vma_name() in mm/madvise.c and vma_flag_set_atomic() in include/linux/mm.h. This series adds vma_assert_stablised() which abstract this can be used in these callsites instead. This implementation uses lockdep where possible - that is VMA read locks - which correctly track read lock acquisition/release via: vma_start_read() -> rwsem_acquire_read() vma_start_read_locked() -> vma_start_read_locked_nested() -> rwsem_acquire_read() And: vma_end_read() -> vma_refcount_put() -> rwsem_release() We don't track the VMA locks using lockdep for VMA write locks, however these are predicated upon mmap write locks whose lockdep state we do track, and additionally vma_assert_stabillised() asserts this check if VMA read lock is not held, so we get lockdep coverage in this case also. We also add extensive comments to describe what we're doing. There's some tricky stuff around mmap locking and stabilisation races that we have to be careful of that I describe in the patch introducing vma_assert_stabilised(). This change also lays the foundation for future series to add this assert in further places where we wish to make it clear that we rely upon a stabilised VMA. The motivation for this change was precisely this. This patch (of 10): The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in order to indicate that a VMA is in the process of having VMA read-locks excluded in __vma_enter_locked() (that is, first checking if there are any VMA read locks held, and if there are, waiting on them to be released). This happens when a VMA write lock is being established, or a VMA is being marked detached and discovers that the VMA reference count is elevated due to read-locks temporarily elevating the reference count only to discover a VMA write lock is in place. The naming does not convey any of this, so rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate from the newly introduced VMA_*_BIT flags). Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also. Update comments to reflect this. No functional change intended. Link: https://lkml.kernel.org/r/817bd763e5fe35f23e01347996f9007e6eb88460.1769198904.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:49 -08:00
Taeyang Kim	b94c317903	mm: update kernel-doc for __swap_cache_clear_shadow() The kernel-doc comment referred to swap_cache_clear_shadow(), but the actual function name is __swap_cache_clear_shadow(). Update the comment to match the function name. Link: https://lkml.kernel.org/r/20260117101428.113154-1-maainnewkin59@gmail.com Signed-off-by: Taeyang Kim <maainnewkin59@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:47 -08:00
SeongJae Park	cc1db8dff8	mm/damon: rename min_sz_region of damon_ctx to min_region_sz 'min_sz_region' field of 'struct damon_ctx' represents the minimum size of each DAMON region for the context. 'struct damos_access_pattern' has a field of the same name. It confuses readers and makes 'grep' less optimal for them. Rename it to 'min_region_sz'. Link: https://lkml.kernel.org/r/20260117175256.82826-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:47 -08:00
SeongJae Park	dfb1b0c9dc	mm/damon: rename DAMON_MIN_REGION to DAMON_MIN_REGION_SZ The macro is for the default minimum size of each DAMON region. There was a case that a reader was confused if it is the minimum number of total DAMON regions, which is set on damon_attrs->min_nr_regions. Make the name more explicit. Link: https://lkml.kernel.org/r/20260117175256.82826-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:46 -08:00
SeongJae Park	52c5d3ee8a	mm/damon/core: rename damos_filter_out() to damos_core_filter_out() DAMOS filters are processed on the core layer and operations layer, depending on their types. damos_filter_out() in core.c, which is for only core layer handled filters, can confuse the fact. Rename it to damos_core_filter_out(), to be more explicit about the fact. Link: https://lkml.kernel.org/r/20260117175256.82826-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:46 -08:00
SeongJae Park	ebc4734ad2	mm/damon/core: process damon_call_control requests on a local list kdamond_call() handles damon_call() requests on the ->call_controls list of damon_ctx, which is shared with damon_call() callers. To protect the list from concurrent accesses while letting the callback function independent of the call_controls_lock, the function does complicated locking operations. For each damon_call_control object on the list, the function removes the control object from the list under locking, invoke the callback of the control object without locking, and then puts the control object back to the list if needed, under locking. It is complicated, and can contend the locks more frequently with other DAMON API caller threads as the number of concurrent callback requests increases. Contention overhead is not a big deal, but the increased race opportunity can make headaches. Simplify the locking sequence by moving all damon_call_control objects from the shared list to a local list at once under the single lock protection, processing the callback requests without locking, and adding back repeat mode controls to the shared list again at once again, again under the single lock protection. This change makes the number of locking in kdamond_call() be always two, regardless of the number of the queued requests. Link: https://lkml.kernel.org/r/20260117175256.82826-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:46 -08:00
SeongJae Park	69714a74c1	mm/damon/core: cancel damos_walk() before damon_ctx->kdamond reset damos_walk() request is canceled after damon_ctx->kdamond is reset. This can make weird situations where damon_is_running() returns false but the DAMON context has the damos_walk() request linked. There was a similar situation for damon_call() requests handling [1], which _was_ able to cause a racy use-after-free bug. Unlike the case of damon_call(), because damos_walk() is always synchronously handled and allows only single request at time, there is no such problematic race cases. But, keeping it as is could stem another subtle race condition bug in future. Avoid that by cancelling the requests before the ->kdamond reset. Note that this change also makes all damon_ctx dependent resource cleanups consistently done before the damon_ctx->kdamond reset. Link: https://lkml.kernel.org/r/20260117175256.82826-4-sj@kernel.org Link: https://lore.kernel.org/20251230014532.47563-1-sj@kernel.org [1] Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:45 -08:00
SeongJae Park	1736047a4e	mm/damon/core: cleanup targets and regions at once on kdamond termination When kdamond terminates, it destroys the regions of the context first, and the targets of the context just before the kdamond main function returns. Because regions are linked inside targets, doing them separately is only inefficient and looks weird. A more serious problem is that the cleanup of the targets is done after damon_ctx->kdamond reset, which is the event that lets DAMON API callers know the kdamond is no longer actively running. That is, some DAMON targets could still exist while kdamond is not running. There are no real problems from this, but this implicit fact could cause subtle racy issues in future. Destroy targets and regions at one. Adding contexts on how the code has evolved in the way. Doing only regions destruction was because putting pids of the targets were done on DAMON API callers. Commit `7114bc5e01` ("mm/damon/core: add cleanup_target() ops callback") moved the role to be done via operations set on each target destruction. Hence it removed the reason to do only regions cleanup. Commit `3a69f16357` ("mm/damon/core: destroy targets when kdamond_fn() finish") therefore further destructed targets on kdamond termination time. It was still separated from regions destruction because damon_operations->cleanup() may do additional targets cleanup. Placing the targets destruction after damon_ctx->kdamond reset was just an unnecessary decision of the commit. The previous commit removed damon_operations->cleanup(), so there is no more reason to do destructions of regions and targets separately. Link: https://lkml.kernel.org/r/20260117175256.82826-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:45 -08:00
SeongJae Park	50962b16c0	mm/damon: remove damon_operations->cleanup() Patch series "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION". Do miscellaneous code cleanups for improving readability. First three patches cleanup kdamond termination process, by removing unused operation set cleanup callback (patch 1) and moving damon_ctx specific resource cleanups on kdamond termination to synchronization-easy place (patches 2 and 3). Next two patches touch damon_call() infrastructure, by refactoring kdamond_call() function to do less and simpler locking operations (patch 4), and documenting when dealloc_on_free does work (patch 5). Final three patches rename things for clear uses of those. Those rename damos_filter_out() to be more explicit about the fact that it is only for core-handled filters (patch 6), DAMON_MIN_REGION macro to be more explicit it is not about number of regions but size of each region (patch 7), and damon_ctx->min_sz_region to be different from damos_access_patern->min_sz_region (patch 8), so that those are not confusing and easy to grep. This patch (of 8): damon_operations->cleanup() was added for a case that an operation set implementation requires additional cleanups. But no such implementation exists at the moment. Remove it. Link: https://lkml.kernel.org/r/20260117175256.82826-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260117175256.82826-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:45 -08:00
Kefeng Wang	ae85e56108	mm: hugetlb_cma: mark hugetlb_cma{_only} as __ro_after_init hugetlb_cma and hugetlb_cma_only are initialized once during init and never changed. Link: https://lkml.kernel.org/r/20260112150954.1802953-6-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:43 -08:00
Kefeng Wang	d925730734	mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio() Check hugetlb_cma_size which helps to avoid unnecessary gfp check or nodemask traversal. Link: https://lkml.kernel.org/r/20260112150954.1802953-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:43 -08:00
Kefeng Wang	5a74b9f1dc	mm: hugetlb: optimize replace_free_hugepage_folios() If no free hugepage folios are available, there is no need to perform any replacement operations. Additionally, gigantic folios should not be replaced under any circumstances. Therefore, we only check for the presence of non-gigantic folios, also adding the gigantic folio check to avoid accidental replacement. To optimize performance, we skip unnecessary iterations over pfn for compound pages and high-order buddy pages to save processing time. A simple test on machine with 114G free memory, allocate 120 * 1G HugeTLB folios(104 successfully returned), time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages Before: 0m0.602s After: 0m0.431s [wangkefeng.wang@huawei.com: v2] Link: https://lkml.kernel.org/r/20260114135512.2159799-1-wangkefeng.wang@huawei.com [akpm@linux-foundation.org: use single-return-point style, tweak comment] Link: https://lkml.kernel.org/r/20260112150954.1802953-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:43 -08:00
Kefeng Wang	9a8e0c31b3	mm: page_alloc: optimize pfn_range_valid_contig() The alloc_contig_pages() spends a significant amount of time within pfn_range_valid_contig(). - set_max_huge_pages - 99.98% alloc_pool_huge_folio only_alloc_fresh_hugetlb_folio.isra.0 - alloc_contig_frozen_pages_noprof - 87.00% pfn_range_valid_contig pfn_to_online_page - 12.91% alloc_contig_frozen_range_noprof 4.51% replace_free_hugepage_folios - 4.02% prep_new_page prep_compound_page - 2.98% undo_isolate_page_range - 2.79% unset_migratetype_isolate - 2.75% __move_freepages_block_isolate 2.71% __move_freepages_block - 0.98% start_isolate_page_range 0.66% set_migratetype_isolate To optimize this process, use the new helper page_is_unmovable() to avoid more unnecessary iterations for compound pages, such as THP not on LRU, and high-order buddy pages, which significantly improving the efficiency of contiguous memory allocation. A simple test on machine with 114G free memory, allocate 120 * 1G HugeTLB folios(104 successfully returned), time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages Before: 0m3.605s After: 0m0.602s Link: https://lkml.kernel.org/r/20260112150954.1802953-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:42 -08:00
Kefeng Wang	c83109e95c	mm: page_isolation: introduce page_is_unmovable() Patch series "mm: accelerate gigantic folio allocation". Optimize pfn_range_valid_contig() and replace_free_hugepage_folios() in alloc_contig_frozen_pages() to speed up gigantic folio allocation. The allocation time for 120*1G folios drops from 3.605s to 0.431s. This patch (of 5): Factor out the check if a page is unmovable into a new helper, and will be reused in the following patch. No functional change intended, the minor changes are as follows, 1) Avoid unnecessary calls by checking CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION 2) Directly call PageCompound since PageTransCompound may be dropped 3) Using folio_test_hugetlb() Link: https://lkml.kernel.org/r/20260112150954.1802953-1-wangkefeng.wang@huawei.com Link: https://lkml.kernel.org/r/20260112150954.1802953-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:42 -08:00
Kevin Brodsky	3a64d5b82e	sparc/mm: export symbols for lazy_mmu_mode KUnit tests The lazy_mmu_mode KUnit tests call lazy_mmu_mode_{enable,disable}. These tests may be built as a module, and because of inlining this means that arch_{enter,flush,leave}_lazy_mmu_mode need to be exported. [akpm@linux-foundation.org: remove mm/tests/lazy_mmu_mode_kunit.c comment, per Kevin] Link: https://lkml.kernel.org/r/20251218100541.2667405-1-kevin.brodsky@arm.com Fixes: `ee628d9cc8` ("mm: add basic tests for lazy_mmu") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:40 -08:00
Marco Crivellari	ed0a826ce3	mm: add WQ_PERCPU to alloc_workqueue users This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The refactoring is going to alter the default behavior of alloc_workqueue() to be unbound by default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn't explicitly specify WQ_UNBOUND must now use WQ_PERCPU. For more details see the Link tag below. In order to keep alloc_workqueue() behavior identical, explicitly request WQ_PERCPU. [akpm@linux-foundation.org: fix mm/slub.c] [akpm@linux-foundation.org: fix kmem_cache_init_late() properly, per Sebastian] Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Link: https://lkml.kernel.org/r/20260113114630.152942-4-marco.crivellari@suse.com Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:39 -08:00
Marco Crivellari	73b2162126	mm: replace use of system_wq with system_percpu_wq This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Link: https://lkml.kernel.org/r/20260113114630.152942-3-marco.crivellari@suse.com Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:39 -08:00
Marco Crivellari	0bcbd7cf65	mm: replace use of system_unbound_wq with system_dfl_wq Patch series "Replace wq users and add WQ_PERCPU to alloc_workqueue() users", v2. This series continues the effort to refactor the Workqueue API. No behavior changes are introduced by this series. === Recent changes to the WQ API === The following, address the recent changes in the Workqueue API: - commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") - commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The old workqueues will be removed in a future release cycle and unbound will become the implicit default. === Introduced Changes by this series === 1) [P 1-2] Replace use of system_wq and system_unbound_wq Workqueue users converted to the better named new workqueues: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. 2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. WQ_UNBOUND will be removed in future. For more information: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ This patch (of 3): This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit `128ea9f6cc` ("workqueue: Add system_percpu_wq and system_dfl_wq") commit `930c2ea566` ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. Link: https://lkml.kernel.org/r/20260113114630.152942-1-marco.crivellari@suse.com Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Link: https://lkml.kernel.org/r/20260113114630.152942-2-marco.crivellari@suse.com Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Suggested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Marco Elver <elver@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:39 -08:00
Joshua Hahn	824b8c96c4	mm/hugetlb: enforce brace style Documentation/process/coding-style.rst explicitly notes that if only one branch of a conditional statement is a single statement, braces should be used in both branches. Enforce this in mm/hugetlb.c. While add it, fix the indentation for vma_end_reservation. No functional change intended. Link: https://lkml.kernel.org/r/20260116192717.1600049-2-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:39 -08:00
Joshua Hahn	a1c655f554	mm/hugetlb: remove unnecessary if condition if (map_chg) is always true, since it is nested in another if statement which checks for map_chg == MAP_CHG_NEEDED, which is equal to 1. if (unlikely(map_chg == MAP_CHG_NEEDED && retval == 0)) { ... if (map_chg) { ... } } Remove the check, un-indent, and collapse the function call for readability. No functional change intended. Link: https://lkml.kernel.org/r/20260116192717.1600049-1-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:38 -08:00
William Tambe	94350fe6ca	mm/highmem: fix __kmap_to_page() build error This changes fixes following build error which is a miss from `ef6e06b2ef` ("highmem: fix kmap_to_page() for kmap_local_page() addresses"). mm/highmem.c:184:66: error: 'pteval' undeclared (first use in this function); did you mean 'pte_val'? 184 \| idx = arch_kmap_local_map_idx(i, pte_pfn(pteval)); In __kmap_to_page(), pteval is used but does not exist in the function. (akpm: affects xtensa only) Link: https://lkml.kernel.org/r/SJ0PR07MB86317E00EC0C59DA60935FDCD18DA@SJ0PR07MB8631.namprd07.prod.outlook.com Fixes: `ef6e06b2ef` ("highmem: fix kmap_to_page() for kmap_local_page() addresses") Signed-off-by: William Tambe <williamt@cadence.com> Reviewed-by: Max Filippov <jcmvbkbc@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:38 -08:00
Jiayuan Chen	a45088376d	mm/vmscan: add tracepoint and reason for kswapd_failures reset Currently, kswapd_failures is reset in multiple places (kswapd, direct reclaim, PCP freeing, memory-tiers), but there's no way to trace when and why it was reset, making it difficult to debug memory reclaim issues. This patch: 1. Introduce kswapd_clear_hopeless() as a wrapper function to centralize kswapd_failures reset logic. 2. Introduce kswapd_test_hopeless() to encapsulate hopeless node checks, replacing all open-coded kswapd_failures comparisons. 3. Add kswapd_clear_hopeless_reason enum to distinguish reset sources: - KSWAPD_CLEAR_HOPELESS_KSWAPD: reset from kswapd context - KSWAPD_CLEAR_HOPELESS_DIRECT: reset from direct reclaim - KSWAPD_CLEAR_HOPELESS_PCP: reset from PCP page freeing - KSWAPD_CLEAR_HOPELESS_OTHER: reset from other paths 4. Add tracepoints for better observability: - mm_vmscan_kswapd_clear_hopeless: traces each reset with reason - mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure Test results: $ trace-cmd record -e vmscan:mm_vmscan_kswapd_clear_hopeless -e vmscan:mm_vmscan_kswapd_reclaim_fail $ # generate memory pressure $ trace-cmd report cpus=4 kswapd0-71 [000] 27.216563: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.217169: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.217764: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.218353: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.218993: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.219744: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.220488: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.221206: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.221806: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.222634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.223286: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.223894: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.224712: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.225424: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.226082: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.226810: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 kswapd1-72 [002] 27.386869: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=1 kswapd1-72 [002] 27.387435: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=2 kswapd1-72 [002] 27.388016: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=3 kswapd1-72 [002] 27.388586: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=4 kswapd1-72 [002] 27.389155: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=5 kswapd1-72 [002] 27.389723: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=6 kswapd1-72 [002] 27.390292: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=7 kswapd1-72 [002] 27.392364: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=8 kswapd1-72 [002] 27.392934: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=9 kswapd1-72 [002] 27.393504: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=10 kswapd1-72 [002] 27.394073: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=11 kswapd1-72 [002] 27.394899: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=12 kswapd1-72 [002] 27.395472: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=13 kswapd1-72 [002] 27.396055: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=14 kswapd1-72 [002] 27.396628: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=15 kswapd1-72 [002] 27.397199: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=16 kworker/u18:0-40 [002] 27.410151: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=DIRECT kswapd0-71 [000] 27.439454: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.440048: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.440634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.441211: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.441787: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.442363: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.443030: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.443725: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.444315: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.444898: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.445476: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.446053: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.446646: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.447230: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.447812: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.448391: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 ann-423 [003] 28.028285: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=PCP Link: https://lkml.kernel.org/r/20260120024402.387576-3-jiayuan.chen@linux.dev Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> [tracing] Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:38 -08:00
Jiayuan Chen	dc9fe9b705	mm/vmscan: mitigate spurious kswapd_failures reset from direct reclaim Patch series "mm/vmscan: add tracepoint and reason for kswapd_failures reset", v4. Currently, kswapd_failures is reset in multiple places (kswapd, direct reclaim, PCP freeing, memory-tiers), but there's no way to trace when and why it was reset, making it difficult to debug memory reclaim issues. This patch: 1. Introduce kswapd_clear_hopeless() as a wrapper function to centralize kswapd_failures reset logic. 2. Introduce kswapd_test_hopeless() to encapsulate hopeless node checks, replacing all open-coded kswapd_failures comparisons. 3. Add kswapd_clear_hopeless_reason enum to distinguish reset sources: - KSWAPD_CLEAR_HOPELESS_KSWAPD: reset from kswapd context - KSWAPD_CLEAR_HOPELESS_DIRECT: reset from direct reclaim - KSWAPD_CLEAR_HOPELESS_PCP: reset from PCP page freeing - KSWAPD_CLEAR_HOPELESS_OTHER: reset from other paths 4. Add tracepoints for better observability: - mm_vmscan_kswapd_clear_hopeless: traces each reset with reason - mm_vmscan_kswapd_reclaim_fail: traces each kswapd reclaim failure Test results: $ trace-cmd record -e vmscan:mm_vmscan_kswapd_clear_hopeless -e vmscan:mm_vmscan_kswapd_reclaim_fail $ # generate memory pressure $ trace-cmd report cpus=4 kswapd0-71 [000] 27.216563: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.217169: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.217764: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.218353: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.218993: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.219744: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.220488: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.221206: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.221806: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.222634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.223286: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.223894: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.224712: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.225424: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.226082: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.226810: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 kswapd1-72 [002] 27.386869: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=1 kswapd1-72 [002] 27.387435: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=2 kswapd1-72 [002] 27.388016: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=3 kswapd1-72 [002] 27.388586: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=4 kswapd1-72 [002] 27.389155: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=5 kswapd1-72 [002] 27.389723: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=6 kswapd1-72 [002] 27.390292: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=7 kswapd1-72 [002] 27.392364: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=8 kswapd1-72 [002] 27.392934: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=9 kswapd1-72 [002] 27.393504: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=10 kswapd1-72 [002] 27.394073: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=11 kswapd1-72 [002] 27.394899: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=12 kswapd1-72 [002] 27.395472: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=13 kswapd1-72 [002] 27.396055: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=14 kswapd1-72 [002] 27.396628: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=15 kswapd1-72 [002] 27.397199: mm_vmscan_kswapd_reclaim_fail: nid=1 failures=16 kworker/u18:0-40 [002] 27.410151: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=DIRECT kswapd0-71 [000] 27.439454: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=1 kswapd0-71 [000] 27.440048: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=2 kswapd0-71 [000] 27.440634: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=3 kswapd0-71 [000] 27.441211: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=4 kswapd0-71 [000] 27.441787: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=5 kswapd0-71 [000] 27.442363: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=6 kswapd0-71 [000] 27.443030: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=7 kswapd0-71 [000] 27.443725: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=8 kswapd0-71 [000] 27.444315: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=9 kswapd0-71 [000] 27.444898: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=10 kswapd0-71 [000] 27.445476: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=11 kswapd0-71 [000] 27.446053: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=12 kswapd0-71 [000] 27.446646: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=13 kswapd0-71 [000] 27.447230: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=14 kswapd0-71 [000] 27.447812: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=15 kswapd0-71 [000] 27.448391: mm_vmscan_kswapd_reclaim_fail: nid=0 failures=16 ann-423 [003] 28.028285: mm_vmscan_kswapd_clear_hopeless: nid=0 reason=PCP This patch (of 2): When kswapd fails to reclaim memory, kswapd_failures is incremented. Once it reaches MAX_RECLAIM_RETRIES, kswapd stops running to avoid futile reclaim attempts. However, any successful direct reclaim unconditionally resets kswapd_failures to 0, which can cause problems. We observed an issue in production on a multi-NUMA system where a process allocated large amounts of anonymous pages on a single NUMA node, causing its watermark to drop below high and evicting most file pages: $ numastat -m Per-node system memory usage (in MBs): Node 0 Node 1 Total --------------- --------------- --------------- MemTotal 128222.19 127983.91 256206.11 MemFree 1414.48 1432.80 2847.29 MemUsed 126807.71 126551.11 252358.82 SwapCached 0.00 0.00 0.00 Active 29017.91 25554.57 54572.48 Inactive 92749.06 95377.00 188126.06 Active(anon) 28998.96 23356.47 52355.43 Inactive(anon) 92685.27 87466.11 180151.39 Active(file) 18.95 2198.10 2217.05 Inactive(file) 63.79 7910.89 7974.68 With swap disabled, only file pages can be reclaimed. When kswapd is woken (e.g., via wake_all_kswapds()), it runs continuously but cannot raise free memory above the high watermark since reclaimable file pages are insufficient. Normally, kswapd would eventually stop after kswapd_failures reaches MAX_RECLAIM_RETRIES. However, containers on this machine have memory.high set in their cgroup. Business processes continuously trigger the high limit, causing frequent direct reclaim that keeps resetting kswapd_failures to 0. This prevents kswapd from ever stopping. The key insight is that direct reclaim triggered by cgroup memory.high performs aggressive scanning to throttle the allocating process. With sufficiently aggressive scanning, even hot pages will eventually be reclaimed, making direct reclaim "successful" at freeing some memory. However, this success does not mean the node has reached a balanced state - the freed memory may still be insufficient to bring free pages above the high watermark. Unconditionally resetting kswapd_failures in this case keeps kswapd alive indefinitely. The result is that kswapd runs endlessly. Unlike direct reclaim which only reclaims from the allocating cgroup, kswapd scans the entire node's memory. This causes hot file pages from all workloads on the node to be evicted, not just those from the cgroup triggering memory.high. These pages constantly refault, generating sustained heavy IO READ pressure across the entire system. Fix this by only resetting kswapd_failures when the node is actually balanced. This allows both kswapd and direct reclaim to clear kswapd_failures upon successful reclaim, but only when the reclaim actually resolves the memory pressure (i.e., the node becomes balanced). Link: https://lkml.kernel.org/r/20260120024402.387576-1-jiayuan.chen@linux.dev Link: https://lkml.kernel.org/r/20260120024402.387576-2-jiayuan.chen@linux.dev Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:38 -08:00
Mathieu Desnoyers	5898aa8f9a	mm: fix OOM killer inaccuracy on large many-core systems Use the precise, albeit slower, precise RSS counter sums for the OOM killer task selection and console dumps. The approximated value is too imprecise on large many-core systems. The following rss tracking issues were noted by Sweet Tea Dorminy [1], which lead to picking wrong tasks as OOM kill target: Recently, several internal services had an RSS usage regression as part of a kernel upgrade. Previously, they were on a pre-6.2 kernel and were able to read RSS statistics in a backup watchdog process to monitor and decide if they'd overrun their memory budget. Now, however, a representative service with five threads, expected to use about a hundred MB of memory, on a 250-cpu machine had memory usage tens of megabytes different from the expected amount -- this constituted a significant percentage of inaccuracy, causing the watchdog to act. This was a result of commit `f1a7941243` ("mm: convert mm's rss stats into percpu_counter") [1]. Previously, the memory error was bounded by 64*nr_threads pages, a very livable megabyte. Now, however, as a result of scheduler decisions moving the threads around the CPUs, the memory error could be as large as a gigabyte. This is a really tremendous inaccuracy for any few-threaded program on a large machine and impedes monitoring significantly. These stat counters are also used to make OOM killing decisions, so this additional inaccuracy could make a big difference in OOM situations -- either resulting in the wrong process being killed, or in less memory being returned from an OOM-kill than expected. Here is a (possibly incomplete) list of the prior approaches that were used or proposed, along with their downside: 1) Per-thread rss tracking: large error on many-thread processes. 2) Per-CPU counters: up to 12% slower for short-lived processes and 9% increased system time in make test workloads [1]. Moreover, the inaccuracy increases with O(n^2) with the number of CPUs. 3) Per-NUMA-node counters: requires atomics on fast-path (overhead), error is high with systems that have lots of NUMA nodes (32 times the number of NUMA nodes). commit `82241a83cd` ("mm: fix the inaccurate memory statistics issue for users") introduced get_mm_counter_sum() for precise proc memory status queries for some proc files. The simple fix proposed here is to do the precise per-cpu counters sum every time a counter value needs to be read. This applies to the OOM killer task selection, oom task console dumps (printk). This change increases the latency introduced when the OOM killer executes in favor of doing a more precise OOM target task selection. Effectively, the OOM killer iterates on all tasks, for all relevant page types, for which the precise sum iterates on all possible CPUs. As a reference, here is the execution time of the OOM killer before/after the change: AMD EPYC 9654 96-Core (2 sockets) Within a KVM, configured with 256 logical cpus. \| before \| after \| ----------------------------------\|----------\|----------\| nr_processes=40 \| 0.3 ms \| 0.5 ms \| nr_processes=10000 \| 3.0 ms \| 80.0 ms \| Link: https://lkml.kernel.org/r/20260114143642.47333-1-mathieu.desnoyers@efficios.com Fixes: `f1a7941243` ("mm: convert mm's rss stats into percpu_counter") Link: https://lore.kernel.org/lkml/20250331223516.7810-2-sweettea-kernel@dorminy.me/ # [1] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Martin Liu <liumartin@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R . Howlett" <liam.howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Yu Zhao <yuzhao@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Aboorva Devarajan <aboorvad@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:37 -08:00
Manish Kumar	d468d8f86d	mm: drop filename from page_alloc.c header comment The file name in the header comment is redundant and not useful, as the location is already known from the path. Remove it to align with kernel coding style. No functional change. Link: https://lkml.kernel.org/r/20260115193100.116109-1-manish1588@gmail.com Signed-off-by: Manish Kumar <manish1588@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:37 -08:00
David Hildenbrand (Red Hat)	1421758055	mm: rename CONFIG_MEMORY_BALLOON -> CONFIG_BALLOON Let's make it consistent with the naming of the files but also with the naming of CONFIG_BALLOON_MIGRATION. While at it, add a "/* CONFIG_BALLOON */". Link: https://lkml.kernel.org/r/20260119230133.3551867-24-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:36 -08:00
David Hildenbrand (Red Hat)	cd8e95d80b	mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION While compaction depends on migration, the other direction is not the case. So let's make it clearer that this is all about migration of balloon pages. Adjust all comments/docs in the core to talk about "migration" instead of "compaction". While at it add some "/* CONFIG_BALLOON_MIGRATION */". Link: https://lkml.kernel.org/r/20260119230133.3551867-23-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:36 -08:00
David Hildenbrand (Red Hat)	7cf3318a25	mm/kconfig: make BALLOON_COMPACTION depend on MIGRATION Migration support for balloon memory depends on MIGRATION not COMPACTION. Compaction is simply another user of page migration. The last dependency on compaction.c was effectively removed with commit `3d388584d5` ("mm: convert "movable" flag in page->mapping to a page flag"). Ever since, everything for handling movable_ops page migration resides in core migration code. So let's change the dependency and adjust the description + help text. We'll rename BALLOON_COMPACTION separately next. Link: https://lkml.kernel.org/r/20260119230133.3551867-22-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:35 -08:00
David Hildenbrand (Red Hat)	25b48b4cdf	mm: rename balloon_compaction.(c\|h) to balloon.(c\|h) Even without CONFIG_BALLOON_COMPACTION this infrastructure implements basic list and page management for a memory balloon. Link: https://lkml.kernel.org/r/20260119230133.3551867-21-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:35 -08:00
David Hildenbrand (Red Hat)	a3db9e136c	mm/vmscan: drop inclusion of balloon_compaction.h Before commit `b1123ea6d3` ("mm: balloon: use general non-lru movable page feature"), the include was required because of isolated_balloon_page(). It's no longer required, so let's remove it. Link: https://lkml.kernel.org/r/20260119230133.3551867-20-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:35 -08:00
David Hildenbrand (Red Hat)	eee00d0414	mm/balloon_compaction: mark remaining functions for having proper kerneldoc Looks like all we are missing for proper kerneldoc is another "*". Link: https://lkml.kernel.org/r/20260119230133.3551867-18-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:34 -08:00
David Hildenbrand (Red Hat)	631eb22826	mm/balloon_compaction: assert that the balloon_pages_lock is held Let's add some sanity checks for holding the balloon_pages_lock when we're effectively inflating/deflating a page. Link: https://lkml.kernel.org/r/20260119230133.3551867-17-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:34 -08:00
David Hildenbrand (Red Hat)	03d6a2f684	mm/balloon_compaction: move internal helpers to balloon_compaction.c Let's move the helpers that are not required by drivers anymore. While at it, drop the doc of balloon_page_device() as it is trivial. [david@kernel.org: move balloon_page_device() under CONFIG_BALLOON_COMPACTION] Link: https://lkml.kernel.org/r/27f0adf1-54c1-4d99-8b7f-fd45574e7f41@kernel.org Link: https://lkml.kernel.org/r/20260119230133.3551867-16-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:34 -08:00
David Hildenbrand (Red Hat)	9d792ef33e	mm/balloon_compaction: fold balloon_mapping_gfp_mask() into balloon_page_alloc() Let's just remove balloon_mapping_gfp_mask(). Link: https://lkml.kernel.org/r/20260119230133.3551867-15-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:34 -08:00
David Hildenbrand (Red Hat)	0fa3e9a48b	mm/balloon_compaction: remove balloon_page_push/pop() Let's remove these helpers as they are unused now. Link: https://lkml.kernel.org/r/20260119230133.3551867-14-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:33 -08:00
David Hildenbrand (Red Hat)	ddc50a97be	mm/balloon_compaction: make balloon_mops static There is no need to expose this anymore, so let's just make it static. Link: https://lkml.kernel.org/r/20260119230133.3551867-11-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:33 -08:00
David Hildenbrand (Red Hat)	a3fafdd389	mm/balloon_compaction: remove dependency on page lock Let's stop using the page lock in balloon code and instead use only the balloon_device_lock. As soon as we set the PG_movable_ops flag, we might now get isolation callbacks for that page as we are no longer holding the page lock. In there, we'll simply synchronize using the balloon_device_lock. So in balloon_page_isolate() lookup the balloon_dev_info through page->private under balloon_device_lock. It's crucial that we update page->private under the balloon_device_lock, so the isolation callback can properly deal with concurrent deflation. Consequently, make sure that balloon_page_finalize() is called under balloon_device_lock as we remove a page from the list and clear page->private. balloon_page_insert() is already called with the balloon_device_lock held. Note that the core will still lock the pages, for example in isolate_movable_ops_page(). The lock is there still relevant for handling the PageMovableOpsIsolated flag, but that can be later changed to use an atomic test-and-set instead, or moved into the movable_ops backends. Link: https://lkml.kernel.org/r/20260119230133.3551867-10-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:32 -08:00
David Hildenbrand (Red Hat)	8202313e3d	mm/balloon_compaction: use a device-independent balloon (list) lock In order to remove the dependency on the page lock for balloon pages, we need a lock that is independent of the page. It's crucial that we can handle the scenario where balloon deflation (clearing page->private) can race with page isolation (using page->private to obtain the balloon_dev_info where the lock currently resides). The current lock in balloon_dev_info is therefore not suitable. Fortunately, we never really have more than a single balloon device per VM, so we can just keep it simple and use a static lock to protect all balloon devices. Based on this change we will remove the dependency on the page lock next. Link: https://lkml.kernel.org/r/20260119230133.3551867-9-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:32 -08:00
David Hildenbrand (Red Hat)	a00de9ba30	mm/balloon_compaction: centralize adjust_managed_page_count() handling Let's centralize it, by allowing for the driver to enable this handling through a new flag (bool for now) in the balloon device info. Note that we now adjust the counter when adding/removing a page into the balloon list: when removing a page to deflate it, it will now happen before the driver communicated with hypervisor, not afterwards. This shouldn't make a difference in practice. Link: https://lkml.kernel.org/r/20260119230133.3551867-7-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:32 -08:00
David Hildenbrand (Red Hat)	1258460bd3	mm/balloon_compaction: centralize basic page migration handling Let's update the balloon page references, the balloon page list, the BALLOON_MIGRATE counter and the isolated-pages counter in balloon_page_migrate(), after letting the balloon->migratepage() callback deal with the actual inflation+deflation. Note that we now perform the balloon list modifications outside of any implementation-specific locks: which is fine, there is nothing special about these page actions that the lock would be protecting. The old page is already no longer in the list (isolated) and the new page is not yet in the list. Let's use -ENOENT to communicate the special "inflation of new page failed after already deflating the old page" to balloon_page_migrate() so it can handle it accordingly. While at it, rename balloon->b_dev_info to make it match the other functions. Also, drop the comment above balloon_page_migrate(), which seems unnecessary. Link: https://lkml.kernel.org/r/20260119230133.3551867-6-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2026-01-31 14:22:31 -08:00

... 2 3 4 5 6 ...

25960 Commits