2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
linux/fs/xfs/libxfs
Dave Chinner db6a227416 xfs: catch stale AGF/AGF metadata
There is a race condition that can trigger in dmflakey fstests that
can result in asserts in xfs_ialloc_read_agi() and
xfs_alloc_read_agf() firing. The asserts look like this:

 XFS: Assertion failed: pag->pagf_freeblks == be32_to_cpu(agf->agf_freeblks), file: fs/xfs/libxfs/xfs_alloc.c, line: 3440
.....
 Call Trace:
  <TASK>
  xfs_alloc_read_agf+0x2ad/0x3a0
  xfs_alloc_fix_freelist+0x280/0x720
  xfs_alloc_vextent_prepare_ag+0x42/0x120
  xfs_alloc_vextent_iterate_ags+0x67/0x260
  xfs_alloc_vextent_start_ag+0xe4/0x1c0
  xfs_bmapi_allocate+0x6fe/0xc90
  xfs_bmapi_convert_delalloc+0x338/0x560
  xfs_map_blocks+0x354/0x580
  iomap_writepages+0x52b/0xa70
  xfs_vm_writepages+0xd7/0x100
  do_writepages+0xe1/0x2c0
  __writeback_single_inode+0x44/0x340
  writeback_sb_inodes+0x2d0/0x570
  __writeback_inodes_wb+0x9c/0xf0
  wb_writeback+0x139/0x2d0
  wb_workfn+0x23e/0x4c0
  process_scheduled_works+0x1d4/0x400
  worker_thread+0x234/0x2e0
  kthread+0x147/0x170
  ret_from_fork+0x3e/0x50
  ret_from_fork_asm+0x1a/0x30

I've seen the AGI variant from scrub running on the filesysetm
after unmount failed due to systemd interference:

 XFS: Assertion failed: pag->pagi_freecount == be32_to_cpu(agi->agi_freecount) || xfs_is_shutdown(pag->pag_mount), file: fs/xfs/libxfs/xfs_ialloc.c, line: 2804
.....
 Call Trace:
  <TASK>
  xfs_ialloc_read_agi+0xee/0x150
  xchk_perag_drain_and_lock+0x7d/0x240
  xchk_ag_init+0x34/0x90
  xchk_inode_xref+0x7b/0x220
  xchk_inode+0x14d/0x180
  xfs_scrub_metadata+0x2e2/0x510
  xfs_ioc_scrub_metadata+0x62/0xb0
  xfs_file_ioctl+0x446/0xbf0
  __se_sys_ioctl+0x6f/0xc0
  __x64_sys_ioctl+0x1d/0x30
  x64_sys_call+0x1879/0x2ee0
  do_syscall_64+0x68/0x130
  ? exc_page_fault+0x62/0xc0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Essentially, it is the same problem. When _flakey_drop_and_remount()
loads the drop-writes table, it makes all writes silently fail. Writes
are reported to the fs as completed successfully, but they are not
issued to the backing store. The filesystem sees the successful
write completion and marks the metadata buffer clean and removes it
from the AIL.

If this happens at the same time as memory pressure is occuring,
the now-clean AGF and/or AGI buffers can be reclaimed from memory.

Shortly afterwards, but before _flakey_drop_and_remount() runs
unmount, background writeback is kicked and it tries to allocate
blocks for the dirty pages in memory. This then tries to access the
AGF buffer we just turfed out of memory. It's not found, so it gets
read in from disk.

This is all fine, except for the fact that the last writeback of the
AGF did not actually reach disk. The AGF on disk is stale compared
to the in-memory state held by the perag, and so they don't match
and the assert fires.

Then other operations on that inode hang because the task was killed
whilst holding inode locks. e.g:

 Workqueue: xfs-conv/dm-12 xfs_end_io
 Call Trace:
  <TASK>
  __schedule+0x650/0xb10
  schedule+0x6d/0xf0
  schedule_preempt_disabled+0x15/0x30
  rwsem_down_write_slowpath+0x31a/0x5f0
  down_write+0x43/0x60
  xfs_ilock+0x1a8/0x210
  xfs_trans_alloc_inode+0x9c/0x240
  xfs_iomap_write_unwritten+0xe3/0x300
  xfs_end_ioend+0x90/0x130
  xfs_end_io+0xce/0x100
  process_scheduled_works+0x1d4/0x400
  worker_thread+0x234/0x2e0
  kthread+0x147/0x170
  ret_from_fork+0x3e/0x50
  ret_from_fork_asm+0x1a/0x30
  </TASK>

and it's all down hill from there.

Memory pressure is one way to trigger this, another is to run "echo
3 > /proc/sys/vm/drop_caches" randomly while tests are running.

Regardless of how it is triggered, this effectively takes down the
system once umount hangs because it's holding a sb->s_umount lock
exclusive and now every sync(1) call gets stuck on it.

Fix this by replacing the asserts with a corruption detection check
and a shutdown.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-06-27 14:13:34 +02:00
..
xfs_ag_resv.c xfs: allow inode-based btrees to reserve space in the data device 2024-12-23 13:06:03 -08:00
xfs_ag_resv.h xfs: get rid of xfs_ag_resv_rmapbt_alloc 2024-07-04 14:36:13 +05:30
xfs_ag.c xfs: remove the flags argument to xfs_buf_get_uncached 2025-03-18 14:47:45 +01:00
xfs_ag.h xfs: move the min and max group block numbers to xfs_group 2024-11-05 13:38:44 -08:00
xfs_alloc_btree.c xfs: add a generic group pointer to the btree cursor 2024-11-05 13:38:29 -08:00
xfs_alloc_btree.h xfs: standardize the btree maxrecs function parameters 2024-09-01 08:58:20 -07:00
xfs_alloc.c xfs: catch stale AGF/AGF metadata 2025-06-27 14:13:34 +02:00
xfs_alloc.h xfs: support logging EFIs for realtime extents 2024-11-05 13:38:42 -08:00
xfs_attr_leaf.c xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split 2024-10-07 08:00:11 +02:00
xfs_attr_leaf.h xfs: return bool from xfs_attr3_leaf_add 2024-10-07 08:00:11 +02:00
xfs_attr_remote.c xfs: minor cleanups of xfs_attr3_rmt_blocks 2024-05-02 07:48:37 -07:00
xfs_attr_remote.h xfs: create a helper to compute the blockcount of a max sized remote value 2024-05-02 07:48:36 -07:00
xfs_attr_sf.h xfs: pass the attr value to put_listent when possible 2024-04-23 07:47:00 -07:00
xfs_attr.c xfs: prepare to reuse the dquot pointer space in struct xfs_inode 2024-12-23 13:06:03 -08:00
xfs_attr.h xfs: fix xfs_init_attr_trans not handling explicit operation codes 2024-05-27 15:55:52 +05:30
xfs_bit.c
xfs_bit.h
xfs_bmap_btree.c xfs: tidy up xfs_bmap_broot_realloc a bit 2024-12-23 13:06:02 -08:00
xfs_bmap_btree.h xfs: make xfs_iroot_realloc a bmap btree function 2024-12-23 13:06:02 -08:00
xfs_bmap.c xfs: allow block allocator to take an alignment hint 2025-05-07 14:25:31 -07:00
xfs_bmap.h xfs: allow block allocator to take an alignment hint 2025-05-07 14:25:31 -07:00
xfs_btree_mem.c xfs: create a shadow rmap btree during realtime rmap repair 2024-12-23 13:06:09 -08:00
xfs_btree_mem.h xfs: launder in-memory btree buffers before transaction commit 2024-02-22 12:43:36 -08:00
xfs_btree_staging.c xfs: online repair of the realtime rmap btree 2024-12-23 13:06:09 -08:00
xfs_btree_staging.h xfs: don't override bc_ops for staging btrees 2024-02-22 12:37:35 -08:00
xfs_btree.c xfs: introduce realtime refcount btree ondisk definitions 2024-12-23 13:06:10 -08:00
xfs_btree.h xfs: wire up realtime refcount btree cursors 2024-12-23 13:06:12 -08:00
xfs_cksum.h
xfs_da_btree.c xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split 2024-10-07 08:00:11 +02:00
xfs_da_btree.h xfs: create attr log item opcodes and formats for parent pointers 2024-04-23 07:46:57 -07:00
xfs_da_format.h xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function 2024-05-02 07:48:36 -07:00
xfs_defer.c xfs: support logging EFIs for realtime extents 2024-11-05 13:38:42 -08:00
xfs_defer.h xfs: add a realtime flag to the refcount update log redo items 2024-12-23 13:06:11 -08:00
xfs_dir2_block.c xfs: validate explicit directory block buffer owners 2024-04-15 14:58:52 -07:00
xfs_dir2_data.c xfs: don't walk off the end of a directory data block 2024-07-01 09:32:29 +05:30
xfs_dir2_leaf.c xfs: validate explicit directory free block owners 2024-04-15 14:58:52 -07:00
xfs_dir2_node.c xfs: validate explicit directory free block owners 2024-04-15 14:58:52 -07:00
xfs_dir2_priv.h xfs: don't walk off the end of a directory data block 2024-07-01 09:32:29 +05:30
xfs_dir2_sf.c xfs: convert remaining kmem_free() to kfree() 2024-02-13 18:07:34 +05:30
xfs_dir2.c xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() 2025-01-13 14:58:04 +01:00
xfs_dir2.h xfs: mark xfs_dir_isempty static 2025-01-13 14:55:06 +01:00
xfs_dquot_buf.c xfs: use metadir for quota inodes 2024-11-05 13:38:45 -08:00
xfs_errortag.h xfs: allow inode-based btrees to reserve space in the data device 2024-12-23 13:06:03 -08:00
xfs_exchmaps.c xfs: realtime rmap btree transaction reservations 2024-12-23 13:06:04 -08:00
xfs_exchmaps.h xfs: use atomic extent swapping to fix user file fork data 2024-04-15 14:58:53 -07:00
xfs_format.h xfs: support zone gaps 2025-03-03 08:17:09 -07:00
xfs_fs.h xfs: enable fsmap reporting for internal RT devices 2025-03-03 08:17:08 -07:00
xfs_group.c xfs: add group based bno conversion helpers 2024-11-05 13:38:29 -08:00
xfs_group.h xfs: support zone gaps 2025-03-03 08:17:09 -07:00
xfs_health.h xfs: report realtime refcount btree corruption errors to the health system 2024-12-23 13:06:14 -08:00
xfs_ialloc_btree.c xfs: return a 64-bit block count from xfs_btree_count_blocks 2024-12-12 17:45:09 -08:00
xfs_ialloc_btree.h xfs: standardize the btree maxrecs function parameters 2024-09-01 08:58:20 -07:00
xfs_ialloc.c xfs: catch stale AGF/AGF metadata 2025-06-27 14:13:34 +02:00
xfs_ialloc.h xfs: pass the icreate args object to xfs_dialloc 2024-09-01 08:58:19 -07:00
xfs_iext_tree.c xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS 2024-02-13 18:07:34 +05:30
xfs_inode_buf.c xfs: kill XBF_UNMAPPED 2025-03-10 14:29:44 +01:00
xfs_inode_buf.h xfs: enforce metadata inode flag 2024-11-05 13:38:31 -08:00
xfs_inode_fork.c xfs: wire up a new metafile type for the realtime refcount 2024-12-23 13:06:12 -08:00
xfs_inode_fork.h xfs: make xfs_iroot_realloc a bmap btree function 2024-12-23 13:06:02 -08:00
xfs_inode_util.c xfs: define the zoned on-disk format 2025-03-03 08:16:45 -07:00
xfs_inode_util.h xfs: hoist inode free function to libxfs 2024-07-02 11:36:59 -07:00
xfs_log_format.h xfs: define the zoned on-disk format 2025-03-03 08:16:45 -07:00
xfs_log_recover.h xfs: add a realtime flag to the refcount update log redo items 2024-12-23 13:06:11 -08:00
xfs_log_rlimit.c xfs: commit CoW-based atomic writes atomically 2025-05-07 14:25:32 -07:00
xfs_metadir.c xfs: allow inode-based btrees to reserve space in the data device 2024-12-23 13:06:03 -08:00
xfs_metadir.h xfs: read and write metadata inode directory tree 2024-11-05 13:38:31 -08:00
xfs_metafile.c xfs: reduce metafile reservations 2025-03-03 08:16:43 -07:00
xfs_metafile.h xfs: make metabtree reservations global 2025-03-03 08:16:43 -07:00
xfs_ondisk.h xfs: define the zoned on-disk format 2025-03-03 08:16:45 -07:00
xfs_parent.c xfs: add raw parent pointer apis to support repair 2024-04-23 07:47:04 -07:00
xfs_parent.h xfs: add raw parent pointer apis to support repair 2024-04-23 07:47:04 -07:00
xfs_quota_defs.h xfs: use metadir for quota inodes 2024-11-05 13:38:45 -08:00
xfs_refcount_btree.c xfs: add a generic group pointer to the btree cursor 2024-11-05 13:38:29 -08:00
xfs_refcount_btree.h xfs: standardize the btree maxrecs function parameters 2024-09-01 08:58:20 -07:00
xfs_refcount.c xfs: recover CoW leftovers in the realtime volume 2024-12-23 13:06:13 -08:00
xfs_refcount.h xfs: recover CoW leftovers in the realtime volume 2024-12-23 13:06:13 -08:00
xfs_rmap_btree.c xfs: add a generic group pointer to the btree cursor 2024-11-05 13:38:29 -08:00
xfs_rmap_btree.h xfs: standardize the btree maxrecs function parameters 2024-09-01 08:58:20 -07:00
xfs_rmap.c xfs: update rmap to allow cow staging extents in the rt rmap 2024-12-23 13:06:12 -08:00
xfs_rmap.h xfs: add a realtime flag to the rmap update log redo items 2024-12-23 13:06:04 -08:00
xfs_rtbitmap.c xfs: define the zoned on-disk format 2025-03-03 08:16:45 -07:00
xfs_rtbitmap.h xfs: online repair of realtime bitmaps for a realtime group 2024-12-23 13:06:08 -08:00
xfs_rtgroup.c xfs: define the zoned on-disk format 2025-03-03 08:16:45 -07:00
xfs_rtgroup.h xfs: support zone gaps 2025-03-03 08:17:09 -07:00
xfs_rtrefcount_btree.c xfs: report realtime refcount btree corruption errors to the health system 2024-12-23 13:06:14 -08:00
xfs_rtrefcount_btree.h xfs: create routine to allocate and initialize a realtime refcount btree inode 2024-12-23 13:06:12 -08:00
xfs_rtrmap_btree.c xfs: add a xfs_rtrmap_highest_rgbno helper 2025-03-03 08:16:45 -07:00
xfs_rtrmap_btree.h xfs: add a xfs_rtrmap_highest_rgbno helper 2025-03-03 08:16:45 -07:00
xfs_sb.c xfs: Remove duplicate xfs_rtbitmap.h header 2025-03-12 10:00:43 +01:00
xfs_sb.h xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t 2024-11-05 13:38:44 -08:00
xfs_shared.h xfs: introduce realtime refcount btree ondisk definitions 2024-12-23 13:06:10 -08:00
xfs_symlink_remote.c xfs: return from xfs_symlink_verify early on V4 filesystems 2024-12-12 17:45:13 -08:00
xfs_symlink_remote.h xfs: pass the owner to xfs_symlink_write_target 2024-04-15 14:58:57 -07:00
xfs_trans_inode.c xfs: switch to multigrain timestamps 2024-10-10 10:20:52 +02:00
xfs_trans_resv.c xfs: allow sysadmins to specify a maximum atomic write limit at mount time 2025-05-07 14:25:33 -07:00
xfs_trans_resv.h xfs: allow sysadmins to specify a maximum atomic write limit at mount time 2025-05-07 14:25:33 -07:00
xfs_trans_space.c xfs: Add parent pointers to rename 2024-04-23 07:46:59 -07:00
xfs_trans_space.h xfs: realtime rmap btree transaction reservations 2024-12-23 13:06:04 -08:00
xfs_types.c xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t 2024-11-05 13:38:44 -08:00
xfs_types.h xfs: add support for zoned space reservations 2025-03-03 08:17:07 -07:00
xfs_zones.c xfs: support zone gaps 2025-03-03 08:17:09 -07:00
xfs_zones.h xfs: parse and validate hardware zone information 2025-03-03 08:16:46 -07:00