2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

322 Commits

Author SHA1 Message Date
Andreas Gruenbacher
53d6913295 gfs2: Instantiate glocks ouside of glock state engine
Instantiate glocks outside of the glock state engine: there is no real
reason for instantiating them inside the glock state engine; it only
complicates the code.

Instead, instantiate them in gfs2_glock_wait() and gfs2_glock_async_wait()
using the new gfs2_glock_holder_ready() helper.  On top of that, the only
other place that acquires a glock without using gfs2_glock_wait() or
gfs2_glock_async_wait() is gfs2_upgrade_iopen_glock(), so call
gfs2_glock_holder_ready() there as well.

If a dinode has a pending truncate, the glock-specific instantiate function
for inodes wakes up the truncate function in the quota daemon.  Waiting for
the completion of the truncate was previously done by the glock state
engine, but we now need to wait in inode_go_instantiate().

This also means that gfs2_instantiate() will now no longer return any
"special" error codes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 16:53:22 +02:00
Andreas Gruenbacher
ebdc416c9c gfs2: Mark the remaining process-independent glock holders as GL_NOPID
Add the GL_NOPID flag for the remaining glock holders which are not
associated with the current process.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-06-29 13:07:54 +02:00
Linus Torvalds
3d198e42ce gfs2 fixes
* To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.  Further dlm operations on the same lock will return
   -EBUSY until the cancel has been completed, so in that case, wait and
   repeat.  (This is rare.)
 * Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().
 * Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merge c03098d4b9).
 * Various other minor bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJGCAsUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrWcg//TEDazop2y7rGMFsMBXI7HPyBu4uD
 BwoclS5IfjoQbBTtkl7cWmQViMk8s3EFGxdEBorfGmMEq65I/krHi4JXG2GETdui
 ORoi8NH1sW9H2GJXmwtE2wYZlJBZtdntoBGdPXWFvt1hLajf6WGpy/CR1Wd4rYak
 8AHQxtd98OtsA6LAPlWl2UaXS4m7rhEt0Iy83mqWtbBOvZsULczuraazawnoQ/m4
 Wf5pvb+73hpwTVUkruH0+If+vi/HF0WVv1nZVyMwrSh3mpvkrsZSkbN0fd0veAhD
 b5XGI1dD5+YPxAOdwDKqnqy8/E3gRekybmpcd48BXoxF4EX/AlLX/Zn9qnrAhY6M
 qEbGzC2UqLIrPe/KjzQ8+0aKPCY5FB1VqoRMAHC/bj7mlmNgGtHxQUXdDmC4LIi6
 GOLpnueI1KtA7Hb4HCgX0BLxSqUEhUuGssBkNIqGet1cRwmM33pt1J4CG4TDLBt/
 VZiERnN3qktSlmukvd3oLSZso4fVbg7PyFTl8YMgiLDNfgcZI9RY5qwIJYrOaucr
 KTNfR6lAL2slFPIVcLwmgJt+axogk6GnCkfDVMX2VLJnMQYqJnDYn6fVG9jngSB+
 F4UBZ/alzhpel08r8xtxjADFJzA+weG1I2jnikSLKlgVN+uiQTBrhqyWdtxtEqFM
 31Nd7piiSVQEvrM=
 =xMYz
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.

   Further dlm operations on the same lock will return -EBUSY until the
   cancel has been completed, so in that case, wait and repeat. (This is
   rare.)

 - Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().

 - Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merged in commit c03098d4b9: "Merge tag 'gfs2-v5.15-rc5-mmap-fault'").

 - Various other minor bug fixes and cleanups.

* tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make sure FITRIM minlen is rounded up to fs block size
  gfs2: Make sure not to return short direct writes
  gfs2: Remove dead code in gfs2_file_read_iter
  gfs2: Fix gfs2_file_buffered_write endless loop workaround
  gfs2: Minor retry logic cleanup
  gfs2: Disable page faults during lockless buffered reads
  gfs2: Fix should_fault_in_pages() logic
  gfs2: Remove return value for gfs2_indirect_init
  gfs2: Initialize gh_error in gfs2_glock_nq
  gfs2: Make use of list_is_first
  gfs2: Switch lock order of inode and iopen glock
  gfs2: cancel timed-out glock requests
  gfs2: Expect -EBUSY after canceling dlm locking requests
  gfs2: gfs2_setattr_size error path fix
  gfs2: assign rgrp glock before compute_bitstructs
2022-03-31 15:57:50 -07:00
Muchun Song
fd60b28842 fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>		[ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:03 -07:00
Andreas Gruenbacher
7336905a89 gfs2: gfs2_setattr_size error path fix
When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have.  Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.

In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode.  Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.

With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.

Fixes: a097dc7e24 ("GFS2: Make rgrp reservations part of the gfs2_inode structure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-02-15 15:01:40 +01:00
Andreas Gruenbacher
8d567162ef gfs2: Remove redundant check for GLF_INSTANTIATE_NEEDED
If the GLF_INSTANTIATE_NEEDED flag isn't set, gfs2_instantiate() is a
no-op.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-04 20:02:26 +01:00
Bob Peterson
49462e2be1 gfs2: release iopen glock early in evict
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock.  In the meantime, another process could reuse
the same block and thus glocks for a new inode.  It would lock the inode
glock (exclusively), and then the iopen glock (shared).  The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.

Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-11-06 10:25:31 +01:00
Bob Peterson
ec1d398dd7 gfs2: Eliminate GIF_INVALID flag
With the addition of the new GLF_INSTANTIATE_NEEDED flag, the
GIF_INVALID flag is now redundant. This patch removes it.
Since inode_instantiate is only called when instantiation is needed,
the check in inode_instantiate is removed too.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
f2e70d8f2f gfs2: fix GL_SKIP node_scope problems
Before this patch, when a glock was locked, the very first holder on the
queue would unlock the lockref and call the go_instantiate glops function
(if one existed), unless GL_SKIP was specified. When we introduced the new
node-scope concept, we allowed multiple holders to lock glocks in EX mode
and share the lock.

But node-scope introduced a new problem: if the first holder has GL_SKIP
and the next one does NOT, since it is not the first holder on the queue,
the go_instantiate op was not called. Eventually the GL_SKIP holder may
call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was
still a window of time in which another non-GL_SKIP holder assumes the
instantiate function had been called by the first holder. In the case of
rgrp glocks, this led to a NULL pointer dereference on the buffer_heads.

This patch tries to fix the problem by introducing two new glock flags:

GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function
needs to be called to "fill in" or "read in" the object before it is
referenced.

GLF_INSTANTIATE_IN_PROG which is used to determine when a process is
in the process of reading in the object. Whenever a function needs to
reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if
set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate"
function.

As before, the gl_lockref spin_lock is unlocked during the IO operation,
which may take a relatively long amount of time to complete. While
unlocked, if another process determines go_instantiate is still needed,
it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate
glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared,
it needs to check GLF_INSTANTIATE_NEEDED again because the other process's
go_instantiate operation may not have been successful.

Functions that previously called the instantiate sub-functions now call
directly into gfs2_instantiate so the new bits are managed properly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-25 08:42:19 +02:00
Bob Peterson
ba3ca2bcf4 gfs2: nit: gfs2_drop_inode shouldn't return bool
Today, gfs2_drop_inode can return "false" for an int value.
I'm sure this was just an oversight. Change to int value.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20 09:03:46 -05:00
Bob Peterson
70c11ba8f2 gfs2: Don't release and reacquire local statfs bh
Before this patch, several functions in gfs2 related to the updating
of the statfs file used a newly acquired/read buffer_head for the
local statfs file. This is completely unnecessary, because other nodes
should never update it. Recreating the buffer is a waste of time.

This patch allows gfs2 to read in the local statefs buffer_head at
mount time and keep it around until unmount time.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20 09:03:46 -05:00
Bob Peterson
a28dc123fa gfs2: init system threads before freeze lock
Patch 96b1454f2e ("gfs2: move freeze glock outside the make_fs_rw and _ro
functions") changed the gfs2 mount sequence so that it holds the freeze
lock before calling gfs2_make_fs_rw. Before this patch, gfs2_make_fs_rw
called init_threads to initialize the quotad and logd threads. That is a
problem if the system needs to withdraw due to IO errors early in the
mount sequence, for example, while initializing the system statfs inode:

1. An IO error causes the statfs glock to not sync properly after
   recovery, and leaves items on the ail list.
2. The leftover items on the ail list causes its do_xmote call to fail,
   which makes it want to withdraw. But since the glock code cannot
   withdraw (because the withdraw sequence uses glocks) it relies upon
   the logd daemon to initiate the withdraw.
3. The withdraw can never be performed by the logd daemon because all
   this takes place before the logd daemon is started.

This patch moves function init_threads from super.c to ops_fstype.c
and it changes gfs2_fill_super to start its threads before holding the
freeze lock, and if there's an error, stop its threads after releasing
it. This allows the logd to run unblocked by the freeze lock. Thus,
the logd daemon can perform its withdraw sequence properly.

Fixes: 96b1454f2e ("gfs2: move freeze glock outside the make_fs_rw and _ro functions")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2021-08-20 09:01:02 -05:00
Lee Jones
c551f66c5d gfs2: Fix a number of kernel-doc warnings
Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions.  Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-04-09 22:14:13 +02:00
Bob Peterson
ff132c5f93 gfs2: report "already frozen/thawed" errors
Before this patch, gfs2's freeze function failed to report an error
when the target file system was already frozen as it should (and as
generic vfs function freeze_super does. Similarly, gfs2's thaw function
failed to report an error when trying to thaw a file system that is not
frozen, as vfs function thaw_super does. The errors were checked, but
it always returned a 0 return code.

This patch adds the missing error return codes to gfs2 freeze and thaw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-25 18:53:38 +01:00
Andrew Price
62dd0f98a0 gfs2: Flag a withdraw if init_threads() fails
Interrupting mount with ^C quickly enough can cause the kthread_run()
calls in gfs2's init_threads() to fail and the error path leads to a
deadlock on the s_umount rwsem. The abridged chain of events is:

  [mount path]
  get_tree_bdev()
    sget_fc()
      alloc_super()
        down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); [acquired]
    gfs2_fill_super()
      gfs2_make_fs_rw()
        init_threads()
          kthread_run()
            ( Interrupted )
      [Error path]
      gfs2_gl_hash_clear()
        flush_workqueue(glock_workqueue)
          wait_for_completion()

  [workqueue context]
  glock_work_func()
    run_queue()
      do_xmote()
        freeze_go_sync()
          freeze_super()
            down_write(&sb->s_umount) [deadlock]

In freeze_go_sync() there is a gfs2_withdrawn() check that we can use to
make sure freeze_super() is not called in the error path, so add a
gfs2_withdraw_delayed() call when init_threads() fails.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=212231

Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-15 15:32:42 +01:00
Yang Li
eb602521f4 gfs2: make function gfs2_make_fs_ro() to void type
It fixes the following warning detected by coccinelle:
./fs/gfs2/super.c:592:5-10: Unneeded variable: "error". Return "0" on
line 628

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-03-07 17:04:55 +01:00
Linus Torvalds
f6e1e1d1e1 Changes in gfs2:
* Log space and revoke accounting rework to fix some failed asserts.
 * Local resource group glock sharing for better local performance.
 * Add support for version 1802 filesystems: trusted xattr support and
   '-o rgrplvb' mounts by default.
 * Actually synchronize on the inode glock's FREEING bit during withdraw
   ("gfs2: fix glock confusion in function signal_our_withdraw").
 * Fix parallel recovery of multiple journals ("gfs2: keep bios separate
   for each journal").
 * Various other bug fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmA1TmwUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpDZhAArnFj5AhWMI2+DD5o05EILdgDSpwh
 JWYT1pfRqR1OZrs7ZZ7tGZB4H6oytYfJ+4mg9Kk7CE7oJKcBh695IPZoIWv8+BCC
 WIgQGJytCFp4tuDNw11HZ0ahgW4zXPyJTt6jidZ5jVkux31JrUS7fVqSsD2vIPqA
 iQMcJIH+NLTlYbNt4d5T/ngaoRcx7m18RWkcxf6Y+/DBnnwIe4ZDpZmkWVykuncv
 OFSvXK8vKyLWGnvH/MIsywfYeU5rj/0AIu66JhVILQ4v5kGYIigwY3quXP2SoITM
 Z0+N5Gj/N4OWSscRS86zyqhnRucrjDkNP2+oGSzJWgtSXE/KplyfInAmQWzhIPRM
 n7T0boTp+gOTzGq7ELCzj44KICLG76WgDwaR2bLHuQ2/ppVrHNltZqncP2iwynN6
 glfST/eHBUBu1qTYLaOAfkUBlhpKDXu0YPcXX7lH6M0JqyvkRUFfuBAU9dic9D9K
 zsxplHGJrZnE9QFWWbS3aOviPlSHaXfkZF0Xv7QCLyuPRhu+e/qfcAoeVhxSd4+e
 I0grs/TxM61jyju9SmqnM7P+8qYS55naYH1V+6iNCU5dax8MvdxNZuneBQIa07U+
 Y84JPQvTBZDUE0gZ8fUzZtnYS7RqyiG7BL+T4W5Ph7LgxXbgQD7CWerYpg7fBm/j
 HEpjKqrS96zfTyk=
 =45VG
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Log space and revoke accounting rework to fix some failed asserts.

 - Local resource group glock sharing for better local performance.

 - Add support for version 1802 filesystems: trusted xattr support and
   '-o rgrplvb' mounts by default.

 - Actually synchronize on the inode glock's FREEING bit during withdraw
   ("gfs2: fix glock confusion in function signal_our_withdraw").

 - Fix parallel recovery of multiple journals ("gfs2: keep bios separate
   for each journal").

 - Various other bug fixes.

* tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits)
  gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush
  gfs2: Per-revoke accounting in transactions
  gfs2: Rework the log space allocation logic
  gfs2: Minor calc_reserved cleanup
  gfs2: Use resource group glock sharing
  gfs2: Allow node-wide exclusive glock sharing
  gfs2: Add local resource group locking
  gfs2: Add per-reservation reserved block accounting
  gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested}
  gfs2: Check for active reservation in gfs2_release
  gfs2: Don't search for unreserved space twice
  gfs2: Only pass reservation down to gfs2_rbm_find
  gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
  gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
  gfs2: Add trusted xattr support
  gfs2: Enable rgrplvb for sb_fs_format 1802
  gfs2: Don't skip dlm unlock if glock has an lvb
  gfs2: Lock imbalance on error path in gfs2_recover_one
  gfs2: Move function gfs2_ail_empty_tr
  gfs2: Get rid of current_tail()
  ...
2021-02-23 14:04:04 -08:00
Andreas Gruenbacher
803074ad77 Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
Merge the resource group glock sharing feature and the revoke accounting rework.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-23 18:54:22 +01:00
Bob Peterson
4fc7ec31c3 gfs2: Use resource group glock sharing
This patch takes advantage of the new glock holder sharing feature for
resource groups.  We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-17 19:30:28 +01:00
Andreas Gruenbacher
f3708fb59f gfs2: Get rid of sd_reserving_log
This counter and the associated wait queue are only used so that
gfs2_make_fs_ro can efficiently wait for all pending log space
allocations to fail after setting the filesystem to read-only.  This
comes at the cost of waking up that wait queue very frequently.

Instead, when gfs2_log_reserve fails because the filesystem has become
read-only, Wake up sd_log_waitq.  In gfs2_make_fs_ro, set the file
system read-only and then wait until all the log space has been
released.  Give up and report the problem after a while.  With that,
sd_reserving_log and sd_reserving_log_wait can be removed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-03 18:37:24 +01:00
Andreas Gruenbacher
736b2f778f gfs2: Un-obfuscate function jdesc_find_i
Clean up this function to show that it is trivial.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-01-19 21:16:43 +01:00
Eric Biggers
e2728c5621 fs: don't call ->dirty_inode for lazytime timestamp updates
There is no need to call ->dirty_inode for lazytime timestamp updates
(i.e. for __mark_inode_dirty(I_DIRTY_TIME)), since by the definition of
lazytime, filesystems must ignore these updates.  Filesystems only need
to care about the updated timestamps when they expire.

Therefore, only call ->dirty_inode when I_DIRTY_INODE is set.

Based on a patch from Christoph Hellwig:
https://lore.kernel.org/r/20200325122825.1086872-4-hch@lst.de

Link: https://lore.kernel.org/r/20210112190253.64307-6-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-01-13 17:26:33 +01:00
Bob Peterson
96b1454f2e gfs2: move freeze glock outside the make_fs_rw and _ro functions
Before this patch, sister functions gfs2_make_fs_rw and gfs2_make_fs_ro locked
(held) the freeze glock by calling gfs2_freeze_lock and gfs2_freeze_unlock.
The problem is, not all the callers of gfs2_make_fs_ro should be doing this.
The three callers of gfs2_make_fs_ro are: remount (gfs2_reconfigure),
signal_our_withdraw, and unmount (gfs2_put_super). But when unmounting the
file system we can get into the following circular lock dependency:

deactivate_super
   down_write(&s->s_umount); <-------------------------------------- s_umount
   deactivate_locked_super
      gfs2_kill_sb
         kill_block_super
            generic_shutdown_super
               gfs2_put_super
                  gfs2_make_fs_ro
                     gfs2_glock_nq_init sd_freeze_gl
                        freeze_go_sync
                           if (freeze glock in SH)
                              freeze_super (vfs)
                                 down_write(&sb->s_umount); <------- s_umount

This patch moves the hold of the freeze glock outside the two sister rw/ro
functions to their callers, but it doesn't request the glock from
gfs2_put_super, thus eliminating the circular dependency.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-23 00:54:21 +01:00
Bob Peterson
c77b52c0a1 gfs2: Add common helper for holding and releasing the freeze glock
Many places in the gfs2 code queued and dequeued the freeze glock.
Almost all of them acquire it in SHARED mode, and need to specify the
same LM_FLAG_NOEXP and GL_EXACT flags.

This patch adds common helper functions gfs2_freeze_lock and gfs2_freeze_unlock
to make the code more readable, and to prepare for the next patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-23 00:54:13 +01:00
Bob Peterson
dd64fe8167 gfs2: Remove sb_start_write from gfs2_statfs_sync
Before this patch, function gfs2_statfs_sync called sb_start_write and
sb_end_write. This is completely unnecessary because, aside from grabbing
glocks, gfs2_statfs_sync does all its updates to statfs with a transaction:
gfs2_trans_begin and _end. And transactions always do sb_start_intwrite in
gfs2_trans_begin and sb_end_intwrite in gfs2_trans_end.

This patch simply removes the call to sb_start_write.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-03 17:04:24 +01:00
Bob Peterson
a9dd945cce gfs2: Add missing truncate_inode_pages_final for sd_aspace
Gfs2 creates an address space for its rgrps called sd_aspace, but it never
called truncate_inode_pages_final on it. This confused vfs greatly which
tried to reference the address space after gfs2 had freed the superblock
that contained it.

This patch adds a call to truncate_inode_pages_final for sd_aspace, thus
avoiding the use-after-free.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-29 22:16:46 +01:00
Abhi Das
97fd734ba1 gfs2: lookup local statfs inodes prior to journal recovery
We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-23 15:47:14 +02:00
Abhi Das
730926982d gfs2: Add fields for statfs info in struct gfs2_log_header_host
And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-20 23:16:22 +02:00
Jamie Iles
c2a04b02c0 gfs2: use-after-free in sysfs deregistration
syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:

  Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228

  CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events kobject_delayed_cleanup
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   print_address_description.constprop.0+0x5c/0x550
   kasan_report+0x13c/0x1c0
   __asan_report_load1_noabort+0x34/0x60
   memcmp+0xd0/0xd8
   gfs2_uevent+0xc4/0x188
   kobject_uevent_env+0x54c/0x1240
   kobject_uevent+0x2c/0x40
   __kobject_del+0x190/0x1d8
   kobject_delayed_cleanup+0x2bc/0x3b8
   process_one_work+0x96c/0x18c0
   worker_thread+0x3f0/0xc30
   kthread+0x390/0x498
   ret_from_fork+0x10/0x18

  Allocated by task 1110:
   kasan_save_stack+0x28/0x58
   __kasan_kmalloc.isra.0+0xc8/0xe8
   kasan_kmalloc+0x10/0x20
   kmem_cache_alloc_trace+0x1d8/0x2f0
   alloc_super+0x64/0x8c0
   sget_fc+0x110/0x620
   get_tree_bdev+0x190/0x648
   gfs2_get_tree+0x50/0x228
   vfs_get_tree+0x84/0x2e8
   path_mount+0x1134/0x1da8
   do_mount+0x124/0x138
   __arm64_sys_mount+0x164/0x238
   el0_svc_common.constprop.0+0x15c/0x598
   do_el0_svc+0x60/0x150
   el0_svc+0x34/0xb0
   el0_sync_handler+0xc8/0x5b4
   el0_sync+0x15c/0x180

  Freed by task 228:
   kasan_save_stack+0x28/0x58
   kasan_set_track+0x28/0x40
   kasan_set_free_info+0x24/0x48
   __kasan_slab_free+0x118/0x190
   kasan_slab_free+0x14/0x20
   slab_free_freelist_hook+0x6c/0x210
   kfree+0x13c/0x460

Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed.  This means that we
need an explicit free_sbd in the callers.

Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:43 +02:00
Bob Peterson
0a0d9f55c2 gfs2: simplify the logic in gfs2_evict_inode
Now that we've factored out the deleted and undeleted dinode cases
in gfs2_evict_inode, we can greatly simplify the logic. Now the
function is easy to read and understand.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:42 +02:00
Bob Peterson
d90be6ab9a gfs2: factor evict_linked_inode out of gfs2_evict_inode
Now that we've factored out the delete-dinode case to simplify
gfs2_evict_inode, we take it a step further and factor out the other
case: where we don't delete the inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:42 +02:00
Bob Peterson
53dbc27eb1 gfs2: further simplify gfs2_evict_inode with new func evict_should_delete
This patch further simplifies function gfs2_evict_inode() by adding a
new function evict_should_delete. The function may also lock the inode
glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:42 +02:00
Bob Peterson
6e7e9a5055 gfs2: factor evict_unlinked_inode out of gfs2_evict_inode
Function gfs2_evict_inode is way too big, complex and unreadable. This
is a baby step toward breaking it apart to be more readable. It factors
out the portion that deletes the online bits for a dinode that is
unlinked and needs to be deleted. A future patch will factor out more.
(If I factor out too much, the patch itself becomes unreadable).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:41 +02:00
Bob Peterson
23d828fc3f gfs2: rename variable error to ret in gfs2_evict_inode
Function gfs2_evict_inode is too big and unreadable. This patch is just
a baby step toward improving that. This first step just renames variable
error to ret. This will help make future patches more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:41 +02:00
Andreas Gruenbacher
5a61ae1402 gfs2: Make sure we don't miss any delayed withdraws
Commit ca399c96e9 changes gfs2_log_flush to not withdraw the
filesystem while holding the log flush lock, but it fails to check if
the filesystem needs to be withdrawn once the log flush lock has been
released.  Likewise, commit f05b86db31 depends on gfs2_log_flush to
trigger for delayed withdraws.  Add that and clean up the code flow
somewhat.

In gfs2_put_super, add a check for delayed withdraws that have been
missed to prevent these kinds of bugs in the future.

Fixes: ca399c96e9 ("gfs2: flesh out delayed withdraw for gfs2_log_flush")
Fixes: f05b86db31 ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write")
Cc: stable@vger.kernel.org # v5.7+: 462582b99b: gfs2: add some much needed cleanup for log flushes that fail
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-10-14 23:54:41 +02:00
Bob Peterson
e28c02b94f gfs2: When gfs2_dirty_inode gets a glock error, dump the glock
Before this patch, if function gfs2_dirty_inode got an error when
trying to lock the inode glock, it complained, but it didn't say
what glock or inode had the problem.

In this case, it almost always means that dinode_in found an error
with the dinode in the file system. So it makes sense to dump the
glock, which tells us the location of the dinode in the file system.
That will allow us to analyze the corruption from the metadata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-08-07 17:26:24 +02:00
Bob Peterson
c860f8ffbe gfs2: The freeze glock should never be frozen
Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
(Do not freeze) flag, and some did not. We never want to freeze the freeze
glock, so this patch makes it consistently use LM_FLAG_NOEXP always.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03 12:05:35 +02:00
Bob Peterson
623ba664b7 gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE
Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
several places. That's wrong because we always want to know the state
of whether the file system is frozen.

There was also a problem with freeze/thaw transitioning the glock from
frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
to processes that request it in SH mode, unless GL_EXACT is specified.
Therefore, the freeze/thaw code, which tried to reacquire the glock in
SH mode would get the glock in EX mode, and miss the transition from EX
to SH. That made it think the thaw had completed normally, but since the
glock was still cached in EX, other nodes could not freeze again.

This patch removes the GL_NOCACHE flag to allow the freeze glock to be
cached. It also adds the GL_EXACT flag so the glock is fully transitioned
from EX to SH, thereby allowing future freeze operations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-07-03 12:05:35 +02:00
Andreas Gruenbacher
9e8990dea9 gfs2: Smarter iopen glock waiting
When trying to upgrade the iopen glock from a shared to an exclusive lock in
gfs2_evict_inode, abort the wait if there is contention on the corresponding
inode glock: in that case, the inode must still be in active use on another
node, and we're not guaranteed to get the iopen glock anytime soon.

To make this work even better, when we notice contention on the iopen glock and
we can't evict the corresponsing inode and release the iopen glock immediately,
poke the inode glock.  The other node(s) trying to acquire the lock can then
abort instead of timing out.

Thanks to Heinz Mauelshagen for pointing out a locking bug in a previous
version of this patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
9e73330f29 gfs2: Try harder to delete inodes locally
When an inode's link count drops to zero and the inode is cached on
other nodes, the current behavior of gfs2 is to immediately give up and
to rely on the other node(s) to delete the inode if there is iopen glock
contention.  This leads to resource group glock bouncing and the loss of
caching.  With the previous patches in place, we can fix that by not
giving up immediately.

When the inode is still open on other nodes, those nodes won't be able
to evict the inode and give up the iopen glock.  In that case, our lock
conversion request will time out.  The unlink system call will block for
the duration of the iopen lock conversion request.  We're also holding
the inode glock in EX mode for an extended duration, so other nodes
won't be able to make progress on the inode, either.

This is worse than what we had before, but we can prevent other nodes
from getting stuck by aborting our iopen locking request if there is
contention on the inode glock.  This will the the subject of a future
patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
8c7b9262a8 gfs2: Give up the iopen glock on contention
When there's contention on the iopen glock, it means that the link count
of the corresponding inode has dropped to zero on a remote node which is
now trying to delete the inode.  In that case, try to evict the inode so
that the iopen glock will be released, which will allow the remote node
to do its job.

When the inode is still open locally, the inode's reference count won't
drop to zero and so we'll keep holding the inode and its iopen glock.
The remote node will time out its request to grab the iopen glock, and
when the inode is finally closed locally, we'll try to delete it
ourself.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
a0e3cc65fa gfs2: Turn gl_delete into a delayed work
This requires flushing delayed work items in gfs2_make_fs_ro (which is called
before unmounting a filesystem).

When inodes are deleted and then recreated, pending gl_delete work items would
have no effect because the inode generations will have changed, so we can
cancel any pending gl_delete works before reusing iopen glocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:21 +02:00
Andreas Gruenbacher
f286d627ef gfs2: Keep track of deleted inode generations in LVBs
When deleting an inode, keep track of the generation of the deleted inode in
the inode glock Lock Value Block (LVB).  When trying to delete an inode
remotely, check the last-known inode generation against the deleted inode
generation to skip duplicate remote deletes.  This avoids taking the resource
group glock in order to verify the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-06-05 20:19:20 +02:00
Bob Peterson
2297ab6144 gfs2: Fix problems regarding gfs2_qa_get and _put
This patch fixes a couple of places in which gfs2_qa_get and gfs2_qa_put are
not balanced: we now keep references around whenever a file is open for writing
(see gfs2_open_common and gfs2_release), so we need to put all references we
grab in function gfs2_create_inode.  This was broken in the successful case and
on one error path.

This also means that we don't have a reference to put in gfs2_evict_inode.

In addition, gfs2_qa_put was called for the wrong inode in gfs2_link.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-05-08 18:45:11 +02:00
Andreas Gruenbacher
1595548fe7 gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put
Keeping reservations and quotas separate helps reviewing the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Bob Peterson
2fba46a04c gfs2: Change inode qa_data to allow multiple users
Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.

This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Andreas Gruenbacher
969183bc68 gfs2: Switch to list_{first,last}_entry
Replace open-coded versions of list_first_entry and list_last_entry with those
functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Andreas Gruenbacher
40e7e86ef1 gfs2: Clean up inode initialization and teardown
When allocating a new inode, mark the iopen glock holder as uninitialized to
make sure gfs2_evict_inode won't fail after an incomplete create or lookup.  In
gfs2_evict_inode, allow the inode glock to be NULL and remove the duplicate
iopen glock teardown code.  In gfs2_inode_lookup, don't tear down things that
gfs2_evict_inode will already tear down.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-27 14:08:04 -05:00
Bob Peterson
601ef0d52e gfs2: Force withdraw to replay journals and wait for it to finish
When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held.

The "live" glock which is now used to signal when a withdraw
occurs. When a withdraw occurs, the node signals its withdraw by
dequeueing the "live" glock and trying to enqueue it in EX mode,
thus forcing the other nodes to all see a demote request, by way
of a "1CB" (one callback) try lock. The "live" glock is not
granted in EX; the callback is only just used to indicate a
withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

We also need to take care when glocks decide to withdraw, since
the withdraw code now uses glocks.

Also, before this patch, if a process encountered an error and
decided to withdraw, if another process was already withdrawing,
the second withdraw would be silently ignored, which set it free
to unlock its glocks. That's correct behavior if the original
withdrawer encounters further errors down the road. But if
secondary waiters don't wait for the journal replay, unlocking
glocks will allow other nodes to use them, despite the fact that
the journal containing those blocks is being replayed. The
replay needs to finish before our glocks are released to other
nodes. IOW, secondary withdraws need to wait for the first
withdraw to finish.

For example, if an rgrp glock is unlocked by a process that didn't
wait for the first withdraw, a journal replay could introduce file
system corruption by replaying a rgrp block that has already been
granted to a different cluster node.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-02-27 07:53:12 -06:00
Bob Peterson
52b1cdcb7a gfs2: Abort gfs2_freeze if io error is seen
Before this patch, an io error, such as -EIO writing to the journal
would cause function gfs2_freeze to go into an infinite loop,
continuously retrying the freeze operation. But nothing ever clears
the -EIO except unmount after withdraw, which is impossible if the
freeze operation never ends (fails). Instead you get:

[ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6499.773058] gfs2: fsid=dm-32.0: retrying...
[ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6500.797015] gfs2: fsid=dm-32.0: retrying...

This patch adds a check for -EIO in gfs2_freeze, and if seen, it
dequeues the freeze glock, aborts the loop and returns the error.
Also, there's no need to pass the freeze holder to function
gfs2_lock_fs_check_clean since it's only called in one place and
it's a well-known superblock pointer, so this simplifies that.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-15 17:57:30 +01:00
Bob Peterson
60528afa78 gfs2: Don't loop forever in gfs2_freeze if withdrawn
Before this patch, function gfs2_freeze would loop forever if the
filesystem it tries to freeze is withdrawn. That's because function
gfs2_lock_fs_check_clean tries to enqueue the glock of the journal and
the gfs2_glock returns -EIO because you can't enqueue a journaled glock
after a withdraw.

Move the check for file system withdraw inside the loop so that the loop
can end when withdraw occurs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-14 19:47:05 +01:00
Bob Peterson
eb43e660c0 gfs2: Introduce function gfs2_withdrawn
Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-11-14 19:46:18 +01:00
Linus Torvalds
0b36c9eed2 Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more mount API conversions from Al Viro:
 "Assorted conversions of options parsing to new API.

  gfs2 is probably the most serious one here; the rest is trivial stuff.

  Other things in what used to be #work.mount are going to wait for the
  next cycle (and preferably go via git trees of the filesystems
  involved)"

* 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gfs2: Convert gfs2 to fs_context
  vfs: Convert spufs to use the new mount API
  vfs: Convert hypfs to use the new mount API
  hypfs: Fix error number left in struct pointer member
  vfs: Convert functionfs to use the new mount API
  vfs: Convert bpf to use the new mount API
2019-09-24 12:33:34 -07:00
Andrew Price
1f52aa08d1 gfs2: Convert gfs2 to fs_context
Convert gfs2 and gfs2meta to fs_context. Removes the duplicated vfs code
from gfs2_mount and instead uses the new vfs_get_block_super() before
switching the ->root to the appropriate dentry.

The mount option parsing has been converted to the new API and error
reporting for invalid options has been made more precise at the same
time.

All of the mount/remount code has been moved into ops_fstype.c

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: cluster-devel@redhat.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-18 22:47:05 -04:00
Andreas Gruenbacher
d40312598d gfs2: Minor gfs2_alloc_inode cleanup
In gfs2_alloc_inode, when kmem_cache_alloc cannot allocate a new object, return
NULL immediately.  The code currently relies on the fact that i_inode is the
first member in struct gfs2_inode and so ip and &ip->i_inode evaluate to the
same address, but that isn't immediately obvious.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-08-09 17:00:52 +01:00
Bob Peterson
f29e62eed2 gfs2: replace more printk with calls to fs_info and friends
This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:30:27 +02:00
Bob Peterson
55317f5b00 gfs2: simplify gfs2_freeze by removing case
Function gfs2_freeze had a case statement that simply checked the
error code, but the break statements just made the logic hard to
read. This patch simplifies the logic in favor of a simple if.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:26:58 +02:00
Bob Peterson
04aea0ca14 gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:26:35 +02:00
Bob Peterson
5b3a9f348b gfs2: kthread and remount improvements
Before this patch, gfs2 saved the pointers to the two daemon threads
(logd and quotad) in the superblock, but they were never cleared,
even if the threads were stopped (e.g. on remount -o ro). That meant
that certain error conditions (like a withdrawn file system) could
race. For example, xfstests generic/361 caused an IO error during
remount -o ro, which caused the kthreads to be stopped, then the
error flagged. Later, when the test unmounted the file system, it
would try to stop the threads a second time with kthread_stop.

This patch does two things: First, every time it stops the threads
it zeroes out the thread pointer, and also checks whether it's NULL
before trying to stop it. Second, in function gfs2_remount_fs, it
was returning if an error was logged by either of the two functions
for gfs2_make_fs_ro and _rw, which caused it to bypass the online
uevent at the bottom of the function. This removes that bypass in
favor of just running the whole function, then returning the error.
That way, unmounts and remounts won't hang forever.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:03:43 +02:00
Linus Torvalds
9331b6740f SPDX update for 5.2-rc4
Another round of SPDX header file fixes for 5.2-rc4
 
 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
 added, based on the text in the files.  We are slowly chipping away at
 the 700+ different ways people tried to write the license text.  All of
 these were reviewed on the spdx mailing list by a number of different
 people.
 
 We now have over 60% of the kernel files covered with SPDX tags:
 	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
 	Files checked:            64533
 	Files with SPDX:          40392
 	Files with errors:            0
 
 I think the majority of the "easy" fixups are now done, it's now the
 start of the longer-tail of crazy variants to wade through.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPuGTg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykBvQCg2SG+HmDH+tlwKLT/q7jZcLMPQigAoMpt9Uuy
 sxVEiFZo8ZU9v1IoRb1I
 =qU++
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull yet more SPDX updates from Greg KH:
 "Another round of SPDX header file fixes for 5.2-rc4

  These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being
  added, based on the text in the files. We are slowly chipping away at
  the 700+ different ways people tried to write the license text. All of
  these were reviewed on the spdx mailing list by a number of different
  people.

  We now have over 60% of the kernel files covered with SPDX tags:
	$ ./scripts/spdxcheck.py -v 2>&1 | grep Files
	Files checked:            64533
	Files with SPDX:          40392
	Files with errors:            0

  I think the majority of the "easy" fixups are now done, it's now the
  start of the longer-tail of crazy variants to wade through"

* tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429
  ...
2019-06-08 12:52:42 -07:00
Bob Peterson
638803d456 Revert "gfs2: Replace gl_revokes with a GLF flag"
Commit 73118ca8ba introduced a glock reference counting bug in
gfs2_trans_remove_revoke.  Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-06 16:29:26 +02:00
Thomas Gleixner
7336d0e654 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:12 +02:00
Abhi Das
f4686c26ec gfs2: read journal in large chunks
Use bios to read in the journal into the address space of the journal inode
(jd_inode), sequentially and in large chunks.  This is faster for locating the
journal head that the previous binary search approach.  When performing
recovery, we keep the journal in the address space until recovery is done,
which further speeds up things.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:15 +02:00
Abhi Das
8f91821990 gfs2: fix race between gfs2_freeze_func and unmount
As part of the freeze operation, gfs2_freeze_func() is left blocking
on a request to hold the sd_freeze_gl in SH. This glock is held in EX
by the gfs2_freeze() code.

A subsequent call to gfs2_unfreeze() releases the EXclusively held
sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and
resume its operation.

gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete.
If a umount is issued right after unfreeze, it could result in an
inconsistent filesystem because some journal data (statfs update) isn't
written out.

Refer to commit 24972557b1 for a more detailed explanation of how
freeze/unfreeze work.

This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to
complete before returning to the user.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Andreas Gruenbacher
ce895cf15a gfs2: Remove misleading comments in gfs2_evict_inode
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Bob Peterson
73118ca8ba gfs2: Replace gl_revokes with a GLF flag
The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes.  However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero.  Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Al Viro
784494e1d7 gfs2: switch to ->free_inode()
... and use GFS2_I() to get the containing gfs2_inode by inode;
yes, we can feed the address of the first member of structure
to kmem_cache_free(), but let's do it in an obviously safe way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:24 -04:00
Bob Peterson
23e93c9b2c Revert "gfs2: read journal in large chunks to locate the head"
This reverts commit 2a5f14f279.

This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-14 09:52:51 -08:00
Abhi Das
2a5f14f279 gfs2: read journal in large chunks to locate the head
Use bio(s) to read in the journal sequentially in large chunks and
locate the head of the journal.

This version addresses the issues Christoph pointed out w.r.t error handling
and using deprecated API.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
2018-12-11 17:50:36 +01:00
Bob Peterson
8e31582a9a gfs2: Fix minor typo: couln't versus couldn't.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-19 11:34:04 -05:00
Tim Smith
1eb8d73879 GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
Flushing the workqueue can cause operations to happen which might
call gfs2_log_reserve(), or get stuck waiting for locks taken by such
operations.  gfs2_log_reserve() can io_schedule(). If this happens, it
will never wake because the only thing which can wake it is gfs2_logd()
which was already stopped.

This causes umount of a gfs2 filesystem to wedge permanently if, for
example, the umount immediately follows a large delete operation.

When this occured, the following stack trace was obtained from the
umount command

[<ffffffff81087968>] flush_workqueue+0x1c8/0x520
[<ffffffffa0666e29>] gfs2_make_fs_ro+0x69/0x160 [gfs2]
[<ffffffffa0667279>] gfs2_put_super+0xa9/0x1c0 [gfs2]
[<ffffffff811b7edf>] generic_shutdown_super+0x6f/0x100
[<ffffffff811b7ff7>] kill_block_super+0x27/0x70
[<ffffffffa0656a71>] gfs2_kill_sb+0x71/0x80 [gfs2]
[<ffffffff811b792b>] deactivate_locked_super+0x3b/0x70
[<ffffffff811b79b9>] deactivate_super+0x59/0x60
[<ffffffff811d2998>] cleanup_mnt+0x58/0x80
[<ffffffff811d2a12>] __cleanup_mnt+0x12/0x20
[<ffffffff8108c87d>] task_work_run+0x7d/0xa0
[<ffffffff8106d7d9>] exit_to_usermode_loop+0x73/0x98
[<ffffffff81003961>] syscall_return_slowpath+0x41/0x50
[<ffffffff815a594c>] int_ret_from_sys_call+0x25/0x8f
[<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Tim Smith <tim.smith@citrix.com>
Signed-off-by: Mark Syms <mark.syms@citrix.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-09 07:05:36 -05:00
Andreas Gruenbacher
a3479c7fc0 Merge branch 'iomap-write' into linux-gfs2/for-next
Pull in the gfs2 iomap-write changes: Tweak the existing code to
properly support iomap write and eliminate an unnecessary special case
in gfs2_block_map.  Implement iomap write support for buffered and
direct I/O.  Simplify some of the existing code and eliminate code that
is no longer used:

  gfs2: Remove gfs2_write_{begin,end}
  gfs2: iomap direct I/O support
  gfs2: gfs2_extent_length cleanup
  gfs2: iomap buffered write support
  gfs2: Further iomap cleanups

This is based on the following changes on the xfs 'iomap-4.19-merge'
branch:

  iomap: add private pointer to struct iomap
  iomap: add a page_done callback
  iomap: generic inline data handling
  iomap: complete partial direct I/O writes synchronously
  iomap: mark newly allocated buffer heads as new
  fs: factor out a __generic_write_end helper

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:40 +02:00
Andreas Gruenbacher
b7eba890a2 gfs2: Eliminate redundant ip->i_rgd
GFS2 remembers the last rgrp used for allocations in ip->i_rgd.
However, block allocations are made by way of a reservations structure,
ip->i_res, which keeps the last rgrp in ip->i_res.rs_rgd, and ip->i_res
is kept in sync with ip->i_res.rs_rgd, so it's redundant.  Get rid of
ip->i_rgd and just use ip->i_res.rs_rgd in its place.

Based on patches by Robert Peterson.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-07-05 17:47:16 +02:00
Kees Cook
6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Christoph Hellwig
0e11f6443f fs: move I_DIRTY_INODE to fs.h
And use it in a few more places rather than opencoding the values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Abhi Das
957a7acd46 gfs2: Remove inode from ordered write list in gfs2_write_inode()
The vfs clears the I_DIRTY inode flag before calling gfs2_write_inode()
having queued any data that needed to be written to disk.
This is a good time to remove such inodes from our ordered write list
so they don't hang around for long periods of time.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-30 10:00:27 -07:00
Bob Peterson
805c090750 GFS2: Log the reason for log flushes in every log header
This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23 07:39:20 -07:00
Bob Peterson
c1696fb85d GFS2: Introduce new gfs2_log_header_v2
This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible).  Some of these are used for
debug purposes so we can backtrack when problems occur.  Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23 07:38:53 -07:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Bob Peterson
adbc3ddf28 GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
In function gfs2_write_inode, starting with patch a9185b41a4, we
only flush the log and call filemap_fdatawait if we're passed in a
wbc sync_mode of WB_SYNC_ALL. We also need to do these things if
we're evicting a jdata inode, because we might have jdata pages
still attached to bufdata descriptors that need to be revoked, but
by the time it gets to evict() it's too late to start a new
transaction. This patch changes it to treat jdata inodes as if
WB_SYNC_ALL had been specified.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31 14:26:35 +01:00
Linus Torvalds
0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Bob Peterson
942b0cddfb GFS2: Withdraw for IO errors writing to the journal or statfs
Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.

This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.

These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.

Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 10:59:09 -05:00
Andreas Gruenbacher
6a1c8f6dcf gfs2: Defer deleting inodes under memory pressure
When under memory pressure and an inode's link count has dropped to
zero, defer deleting the inode to the delete workqueue.  This avoids
calling into DLM under memory pressure, which can deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:49:13 -05:00
Andreas Gruenbacher
71c1b21368 gfs2: gfs2_evict_inode: Put glocks asynchronously
gfs2_evict_inode is called to free inodes under memory pressure.  The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode.  However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have likely already
been destroyed.

With this change, inodes can end up being unlinked, remote-unlink can be
triggered, and then the inode can be reallocated before all
remote-unlink callbacks are processed.  To detect that, revalidate the
link count in gfs2_evict_inode to make sure we're not deleting an
allocated, referenced inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:45:21 -05:00
Andreas Gruenbacher
61b91cfdc6 gfs2: Fix trivial typos
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson
b2fb7dab7f GFS2: Delete debugfs files only after we evict the glocks
This patch moves the call to gfs2_delete_debugfs_file so that it
comes after the glock hash table has been cleared. This way we
can query the debugfs files if umount hangs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson
240c6235df GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
This patch adds some calls to clear gl_object in function
gfs2_delete_inode. Since we are deleting the inode, and the glock
typically outlives the inode in core, we must clear gl_object
so subsequent use of the glock (e.g. for a new inode in its place)
will not have the old pointer sitting there. In error cases we
need to tidy up after ourselves. In non-error cases, we need to
clear gl_object before we set the block free in the bitmap so
residules aren't left for potential inode creators.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:38 -05:00
Bob Peterson
df3d87bde1 GFS2: Introduce helper for clearing gl_object
This patch introduces a new helper function in glock.h that
clears gl_object, with an added integrity check. An additional
integrity check has been added to glock_set_object, plus comments.
This is step 1 in a series to ensure gl_object integrity.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21 08:20:05 -05:00
David Howells
bc98a42c1f VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
Firstly by applying the following with coccinelle's spatch:

	@@ expression SB; @@
	-SB->s_flags & MS_RDONLY
	+sb_rdonly(SB)

to effect the conversion to sb_rdonly(sb), then by applying:

	@@ expression A, SB; @@
	(
	-(!sb_rdonly(SB)) && A
	+!sb_rdonly(SB) && A
	|
	-A != (sb_rdonly(SB))
	+A != sb_rdonly(SB)
	|
	-A == (sb_rdonly(SB))
	+A == sb_rdonly(SB)
	|
	-!(sb_rdonly(SB))
	+!sb_rdonly(SB)
	|
	-A && (sb_rdonly(SB))
	+A && sb_rdonly(SB)
	|
	-A || (sb_rdonly(SB))
	+A || sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) != A
	+sb_rdonly(SB) != A
	|
	-(sb_rdonly(SB)) == A
	+sb_rdonly(SB) == A
	|
	-(sb_rdonly(SB)) && A
	+sb_rdonly(SB) && A
	|
	-(sb_rdonly(SB)) || A
	+sb_rdonly(SB) || A
	)

	@@ expression A, B, SB; @@
	(
	-(sb_rdonly(SB)) ? 1 : 0
	+sb_rdonly(SB)
	|
	-(sb_rdonly(SB)) ? A : B
	+sb_rdonly(SB) ? A : B
	)

to remove left over excess bracketage and finally by applying:

	@@ expression A, SB; @@
	(
	-(A & MS_RDONLY) != sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
	|
	-(A & MS_RDONLY) == sb_rdonly(SB)
	+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
	)

to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-07-17 08:45:34 +01:00
Andreas Gruenbacher
e0b62e21b7 gfs2: gfs2_create_inode: Keep glock across iput
On failure, keep the inode glock across the final iput of the new inode
so that gfs2_evict_inode doesn't have to re-acquire the glock.  That
way, gfs2_evict_inode won't need to revalidate the block type.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:21:07 -05:00
Andreas Gruenbacher
6f6597baae gfs2: Protect gl->gl_object by spin lock
Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:52 -05:00
Andreas Gruenbacher
4fd1a57952 gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
So far, gfs2_evict_inode clears gl->gl_object and then flushes the glock
work queue to make sure that inode glops which dereference gl->gl_object
have finished running before the inode is destroyed.  However, flushing
the work queue may do more work than needed, and in particular, it may
call into DLM, which we want to avoid here.  Use a bit lock
(GIF_GLOP_PENDING) to synchronize between the inode glops and
gfs2_evict_inode instead to get rid of the flushing.

In addition, flush the work queues of existing glocks before reusing
them for new inodes to get those glocks into a known state: the glock
state engine currently doesn't handle glock re-appropriation correctly.
(We may be able to fix the glock state engine instead later.)

Based on a patch by Steven Whitehouse <swhiteho@redhat.com>.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:24 -05:00
Andreas Gruenbacher
d4da31986c Revert "GFS2: Wait for iopen glock dequeues"
Revert commit 86d067a797: it turns out
that waiting for iopen glock dequeues here isn't needed anymore because
the bugs that commit was meant to fix have been fixed otherwise.

In addition, we want to avoid waiting on glocks in gfs2_evict_inode in
shrinker context because the shrinker may be invoked on behalf of DLM,
in which case calling into DLM again would deadlock.  This commit makes
the described scenario less likely without completely avoiding it; it's
still a step in the right direction, though.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-03 09:14:47 -04:00
Bob Peterson
0d1c7ae9d8 GFS2: Prevent BUG from occurring when normal Withdraws occur
When the GFS2 file system withdraws due to metadata corruption, it
often has outstanding transactions in the journal and delayed work
queued for its glocks. This patch adds some new checks for a
withdrawn file system before proceeding with operations that would
obviously cause a BUG() to be triggered. That allows GFS2 to be
safely unmounted rather than cause the system to go down.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-03-16 08:18:35 -04:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Fabian Frederick
47a9a52794 GFS2: use BIT() macro
Replace 1 << value shift by more explicit BIT() macro

Also fixes two bare unsigned definitions:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+		unsigned hsize = BIT(ip->i_depth);

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-08-02 12:05:27 -05:00
Andreas Gruenbacher
6df9f9a253 gfs2: Lock holder cleanup
Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag.  Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:09 -05:00
Al Viro
fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
Bob Peterson
7508abc4bd GFS2: Check if iopen is held when deleting inode
This patch fixes an error condition in which an inode is partially
created in gfs2_create_inode() but then some error is discovered,
which causes it to fail and call iput() before the iopen glock is
created or held. In that case, gfs2_delete_inode would try to
unlock an iopen glock that doesn't yet exist. Therefore, we test
its holder (which must exist) for the HIF_HOLDER bit before trying
to dq it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-01-14 08:47:42 -05:00
Bob Peterson
ee530beafe GFS2: Truncate address space mapping when deleting an inode
In function gfs2_delete_inode() we write and flush the mapping for
a glock, among other things. We truncate the mapping for the inode,
but we never truncate the mapping for the glock. This patch makes it
also truncate the metamapping. This avoid cases where the glock is
reused by another process who is trying to recreate an inode in its
place using the same block.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-12-18 10:52:21 -06:00