From 1d3f9bb4c8af70304d19c22e30f5d16a2d589bb5 Mon Sep 17 00:00:00 2001 From: Joshua Hahn Date: Fri, 16 Jan 2026 15:40:36 -0500 Subject: [PATCH 1/3] mm/hugetlb: restore failed global reservations to subpool Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool") fixed an underflow error for hstate->resv_huge_pages caused by incorrectly attributing globally requested pages to the subpool's reservation. Unfortunately, this fix also introduced the opposite problem, which would leave spool->used_hpages elevated if the globally requested pages could not be acquired. This is because while a subpool's reserve pages only accounts for what is requested and allocated from the subpool, its "used" counter keeps track of what is consumed in total, both from the subpool and globally. Thus, we need to adjust spool->used_hpages in the other direction, and make sure that globally requested pages are uncharged from the subpool's used counter. Each failed allocation attempt increments the used_hpages counter by how many pages were requested from the global pool. Ultimately, this renders the subpool unusable, as used_hpages approaches the max limit. The issue can be reproduced as follows: 1. Allocate 4 hugetlb pages 2. Create a hugetlb mount with max=4, min=2 3. Consume 2 pages globally 4. Request 3 pages from the subpool (2 from subpool + 1 from global) 4.1 hugepage_subpool_get_pages(spool, 3) succeeds. used_hpages += 3 4.2 hugetlb_acct_memory(h, 1) fails: no global pages left used_hpages -= 2 5. Subpool now has used_hpages = 1, despite not being able to successfully allocate any hugepages. It believes it can now only allocate 3 more hugepages, not 4. With each failed allocation attempt incrementing the used counter, the subpool eventually reaches a point where its used counter equals its max counter. At that point, any future allocations that try to allocate hugeTLB pages from the subpool will fail, despite the subpool not having any of its hugeTLB pages consumed by any user. Once this happens, there is no way to make the subpool usable again, since there is no way to decrement the used counter as no process is really consuming the hugeTLB pages. The underflow issue that the original commit fixes still remains fixed as well. Without this fix, used_hpages would keep on leaking if hugetlb_acct_memory() fails. Link: https://lkml.kernel.org/r/20260116204037.2270096-1-joshua.hahnjy@gmail.com Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool") Signed-off-by: Joshua Hahn Acked-by: Usama Arif Cc: David Hildenbrand Cc: "Liam R. Howlett" Cc: Lorenzo Stoakes Cc: Ma Wupeng Cc: Michal Hocko Cc: Mike Rapoport Cc: Muchun Song Cc: Oscar Salvador Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Waiman Long Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a1832da0f623..77e45dd50ba2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6717,6 +6717,15 @@ out_put_pages: */ hugetlb_acct_memory(h, -gbl_resv); } + /* Restore used_hpages for pages that failed global reservation */ + if (gbl_reserve && spool) { + unsigned long flags; + + spin_lock_irqsave(&spool->lock, flags); + if (spool->max_hpages != -1) + spool->used_hpages -= gbl_reserve; + unlock_or_release_subpool(spool, flags); + } out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); From 338ad1e84d15078a9ae46d7dd7466329ae0bfa61 Mon Sep 17 00:00:00 2001 From: Harry Yoo Date: Mon, 9 Feb 2026 15:26:39 +0900 Subject: [PATCH 2/3] mm/page_alloc: skip debug_check_no_{obj,locks}_freed with FPI_TRYLOCK When CONFIG_DEBUG_OBJECTS_FREE is enabled, debug_check_no_{obj,locks}_freed() functions are called. Since both of them spin on a lock, they are not safe to be called if the FPI_TRYLOCK flag is specified. This leads to a lockdep splat: ================================ WARNING: inconsistent lock state 6.19.0-rc5-slab-for-next+ #326 Tainted: G N -------------------------------- inconsistent {INITIAL USE} -> {IN-NMI} usage. kunit_try_catch/9046 [HC2[2]:SC0[0]:HE0:SE1] takes: ffffffff84ed6bf8 (&obj_hash[i].lock){-.-.}-{2:2}, at: __debug_check_no_obj_freed+0xe0/0x300 {INITIAL USE} state was registered at: lock_acquire+0xd9/0x2f0 _raw_spin_lock_irqsave+0x4c/0x80 __debug_object_init+0x9d/0x1f0 debug_object_init+0x34/0x50 __init_work+0x28/0x40 init_cgroup_housekeeping+0x151/0x210 init_cgroup_root+0x3d/0x140 cgroup_init_early+0x30/0x240 start_kernel+0x3e/0xcd0 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0xf3/0x140 common_startup_64+0x13e/0x148 irq event stamp: 2998 hardirqs last enabled at (2997): [] exc_nmi+0x11a/0x240 hardirqs last disabled at (2998): [] sysvec_irq_work+0x11/0x110 softirqs last enabled at (1416): [] __irq_exit_rcu+0x132/0x1c0 softirqs last disabled at (1303): [] __irq_exit_rcu+0x132/0x1c0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&obj_hash[i].lock); lock(&obj_hash[i].lock); *** DEADLOCK *** Rename free_pages_prepare() to __free_pages_prepare(), add an fpi_t parameter, and skip those checks if FPI_TRYLOCK is set. To keep the fpi_t definition in mm/page_alloc.c, add a wrapper function free_pages_prepare() that always passes FPI_NONE and use it in mm/compaction.c. Link: https://lkml.kernel.org/r/20260209062639.16577-1-harry.yoo@oracle.com Fixes: 8c57b687e833 ("mm, bpf: Introduce free_pages_nolock()") Signed-off-by: Harry Yoo Reviewed-by: Vlastimil Babka Acked-by: Zi Yan Cc: Alexei Starovoitov Cc: Brendan Jackman Cc: Johannes Weiner Cc: Michal Hocko Cc: Sebastian Andrzej Siewior Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cbf758e27aa2..1513084d7507 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1340,8 +1340,8 @@ static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) #endif /* CONFIG_MEM_ALLOC_PROFILING */ -__always_inline bool free_pages_prepare(struct page *page, - unsigned int order) +__always_inline bool __free_pages_prepare(struct page *page, + unsigned int order, fpi_t fpi_flags) { int bad = 0; bool skip_kasan_poison = should_skip_kasan_poison(page); @@ -1434,7 +1434,7 @@ __always_inline bool free_pages_prepare(struct page *page, page_table_check_free(page, order); pgalloc_tag_sub(page, 1 << order); - if (!PageHighMem(page)) { + if (!PageHighMem(page) && !(fpi_flags & FPI_TRYLOCK)) { debug_check_no_locks_freed(page_address(page), PAGE_SIZE << order); debug_check_no_obj_freed(page_address(page), @@ -1473,6 +1473,11 @@ __always_inline bool free_pages_prepare(struct page *page, return true; } +bool free_pages_prepare(struct page *page, unsigned int order) +{ + return __free_pages_prepare(page, order, FPI_NONE); +} + /* * Frees a number of pages from the PCP lists * Assumes all pages on list are in same zone. @@ -1606,7 +1611,7 @@ static void __free_pages_ok(struct page *page, unsigned int order, unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); - if (free_pages_prepare(page, order)) + if (__free_pages_prepare(page, order, fpi_flags)) free_one_page(zone, page, pfn, order, fpi_flags); } @@ -2970,7 +2975,7 @@ static void __free_frozen_pages(struct page *page, unsigned int order, return; } - if (!free_pages_prepare(page, order)) + if (!__free_pages_prepare(page, order, fpi_flags)) return; /* @@ -3027,7 +3032,7 @@ void free_unref_folios(struct folio_batch *folios) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - if (!free_pages_prepare(&folio->page, order)) + if (!__free_pages_prepare(&folio->page, order, FPI_NONE)) continue; /* * Free orders not handled on the PCP directly to the From 61dc9f776705d6db6847c101b98fa4f0e9eb6fa3 Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko Date: Tue, 10 Feb 2026 11:27:38 -0800 Subject: [PATCH 3/3] procfs: fix possible double mmput() in do_procmap_query() When user provides incorrectly sized buffer for build ID for PROCMAP_QUERY we return with -ENAMETOOLONG error. After recent changes this condition happens later, after we unlocked mmap_lock/per-VMA lock and did mmput(), so original goto out is now wrong and will double-mmput() mm_struct. Fix by jumping further to clean up only vm_file and name_buf. Link: https://lkml.kernel.org/r/20260210192738.3041609-1-andrii@kernel.org Fixes: b5cbacd7f86f ("procfs: avoid fetching build ID while holding VMA lock") Signed-off-by: Andrii Nakryiko Reported-by: Ruikai Peng Reported-by: Thomas Gleixner Tested-by: Thomas Gleixner Reviewed-by: Shakeel Butt Reported-by: syzbot+237b5b985b78c1da9600@syzkaller.appspotmail.com Cc: Ruikai Peng Closes: https://lkml.kernel.org/r/CAFD3drOJANTZPuyiqMdqpiRwOKnHwv5QgMNZghCDr-WxdiHvMg@mail.gmail.com Closes: https://lore.kernel.org/all/698aaf3c.050a0220.3b3015.0088.GAE@google.com/T/#u Cc: Signed-off-by: Andrew Morton --- fs/proc/task_mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 26188a4ad1ab..2f55efc36816 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -780,7 +780,7 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) } else { if (karg.build_id_size < build_id_sz) { err = -ENAMETOOLONG; - goto out; + goto out_file; } karg.build_id_size = build_id_sz; } @@ -808,6 +808,7 @@ static int do_procmap_query(struct mm_struct *mm, void __user *uarg) out: query_vma_teardown(&lock_ctx); mmput(mm); +out_file: if (vm_file) fput(vm_file); kfree(name_buf);