mirror of
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
synced 2026-03-22 07:27:12 +08:00
mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations
Since commitcc638f329e("mm, thp: tweak reclaim/compaction effort of local-only and all-node allocations"), THP page fault allocations have settled on the following scheme (from the commit log): 1. local node only THP allocation with no reclaim, just compaction. 2. for madvised VMA's or when synchronous compaction is enabled always - THP allocation from any node with effort determined by global defrag setting and VMA madvise 3. fallback to base pages on any node Recent customer reports however revealed we have a gap in step 1 above. What we have seen is excessive reclaim due to THP page faults on a NUMA node that's close to its high watermark, while other nodes have plenty of free memory. The problem with step 1 is that it promises no reclaim after the compaction attempt, however reclaim is only avoided for certain compaction outcomes (deferred, or skipped due to insufficient free base pages), and not e.g. when compaction is actually performed but fails (we did see compact_fail vmstat counter increasing). THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, which is not the intention. Thus add a check for __GFP_THISNODE that corresponds to this exact situation and prevents continuing with reclaim/compaction once the initial compaction attempt isn't successful in allocating the page. Note that commitcc638f329ehas not introduced this over-reclaim possibility; it appears to exist in some form since commit2f0799a0ff("mm, thp: restore node-local hugepage allocations"). Followup commitsb39d0ee263("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") andcc638f329ehave moved in the right direction, but left the abovementioned gap. Link: https://lkml.kernel.org/r/20251219-costly-noretry-thisnode-fix-v1-1-e1085a4a0c34@suse.cz Fixes:2f0799a0ff("mm, thp: restore node-local hugepage allocations") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
committed by
Andrew Morton
parent
ed60c8e280
commit
9c9828d3ea
@@ -4818,6 +4818,20 @@ restart:
|
||||
compact_result == COMPACT_DEFERRED)
|
||||
goto nopage;
|
||||
|
||||
/*
|
||||
* THP page faults may attempt local node only first,
|
||||
* but are then allowed to only compact, not reclaim,
|
||||
* see alloc_pages_mpol().
|
||||
*
|
||||
* Compaction can fail for other reasons than those
|
||||
* checked above and we don't want such THP allocations
|
||||
* to put reclaim pressure on a single node in a
|
||||
* situation where other nodes might have plenty of
|
||||
* available memory.
|
||||
*/
|
||||
if (gfp_mask & __GFP_THISNODE)
|
||||
goto nopage;
|
||||
|
||||
/*
|
||||
* Looks like reclaim/compaction is worth trying, but
|
||||
* sync compaction could be very expensive, so keep
|
||||
|
||||
Reference in New Issue
Block a user