2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

155 Commits

Author SHA1 Message Date
Dave Chinner
fe08cc5044 xfs: open code sb verifier feature checks
The superblock verifiers are one of the last places that use the sb
version functions to do feature checks. This are all quite simple
uses, and there aren't many of them so open code them all.

Also, move the good version number check into xfs_sb.c instead of it
being an inline function in xfs_format.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:13 -07:00
Dave Chinner
38c26bfd90 xfs: replace xfs_sb_version checks with feature flag checks
Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
	sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
	text	   data	    bss	    dec	    hex	filenam
before	1130866  311352     484 1442702  16038e (TOTALS)
after	1127727  311352     484 1439563  15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:12 -07:00
Dave Chinner
a1d86e8dec xfs: reflect sb features in xfs_mount
Currently on-disk feature checks require decoding the superblock
fileds and so can be non-trivial. We have almost 400 hundred
individual feature checks in the XFS code, so this is a significant
amount of code. To reduce runtime check overhead, pre-process all
the version flags into a features field in the xfs_mount at mount
time so we can convert all the feature checks to a simple flag
check.

There is also a need to convert the dynamic feature flags to update
the m_features field. This is required for attr, attr2 and quota
features. New xfs_mount based wrappers are added for this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:12 -07:00
Dave Chinner
e23b55d537 xfs: rework attr2 feature and mount options
The attr2 feature is somewhat unique in that it has both a superblock
feature bit to enable it and mount options to enable and disable it.

Back when it was first introduced in 2005, attr2 was disabled unless
either the attr2 superblock feature bit was set, or the attr2 mount
option was set. If the superblock feature bit was not set but the
mount option was set, then when the first attr2 format inode fork
was created, it would set the superblock feature bit. This is as it
should be - the superblock feature bit indicated the presence of the
attr2 on disk format.

The noattr2 mount option, however, did not affect the superblock
feature bit. If noattr2 was specified, the on-disk superblock
feature bit was ignored and the code always just created attr1
format inode forks.  If neither of the attr2 or noattr2 mounts
option were specified, then the behaviour was determined by the
superblock feature bit.

This was all pretty sane.

Fast foward 3 years, and we are dealing with fallout from the
botched sb_features2 addition and having to deal with feature
mismatches between the sb_features2 and sb_bad_features2 fields. The
attr2 feature bit was one of these flags. The reconciliation was
done well after mount option parsing and, unfortunately, the feature
reconciliation had a bug where it ignored the noattr2 mount option.

For reasons lost to the mists of time, it was decided that resolving
this issue in commit 7c12f29650 ("[XFS] Fix up noattr2 so that it
will properly update the versionnum and features2 fields.") required
noattr2 to clear the superblock attr2 feature bit.  This greatly
complicated the attr2 behaviour and broke rules about feature bits
needing to be set when those specific features are present in the
filesystem.

By complicated, I mean that it introduced problems due to feature
bit interactions with log recovery. All of the superblock feature
bit checks are done prior to log recovery, but if we crash after
removing a feature bit, then on the next mount we see the feature
bit in the unrecovered superblock, only to have it go away after the
log has been replayed.  This means our mount time feature processing
could be all wrong.

Hence you can mount with noattr2, crash shortly afterwards, and
mount again without attr2 or noattr2 and still have attr2 enabled
because the second mount sees attr2 still enabled in the superblock
before recovery runs and removes the feature bit. It's just a mess.

Further, this is all legacy code as the v5 format requires attr2 to
be enabled at all times and it cannot be disabled.  i.e. the noattr2
mount option returns an error when used on v5 format filesystems.

To straighten this all out, this patch reverts the attr2/noattr2
mount option behaviour back to the original behaviour. There is no
reason for disabling attr2 these days, so we will only do this when
the noattr2 mount option is set. This will not remove the superblock
feature bit. The superblock bit will provide the default behaviour
and only track whether attr2 is present on disk or not. The attr2
mount option will enable the creation of attr2 format inode forks,
and if the superblock feature bit is not set it will be added when
the first attr2 inode fork is created.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19 10:07:11 -07:00
Darrick J. Wong
908ce71e54 xfs: allow setting and clearing of log incompat feature flags
Log incompat feature flags in the superblock exist for one purpose: to
protect the contents of a dirty log from replay on a kernel that isn't
prepared to handle those dirty contents.  This means that they can be
cleared if (a) we know the log is clean and (b) we know that there
aren't any other threads in the system that might be setting or relying
upon a log incompat flag.

Therefore, clear the log incompat flags when we've finished recovering
the log, when we're unmounting cleanly, remounting read-only, or
freezing; and provide a function so that subsequent patches can start
using this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-09 15:57:59 -07:00
Darrick J. Wong
b7df7630cc xfs: fix silly whitespace problems with kernel libxfs
Fix a few whitespace errors such as spaces at the end of the line, etc.
This gets us back to something more closely resembling parity.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-09 11:13:17 -07:00
Christoph Hellwig
e98d5e882b xfs: move the di_crtime field to struct xfs_inode
Move the crtime field from struct xfs_icdinode into stuct xfs_inode and
remove the now entirely unused struct xfs_icdinode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-04-07 14:37:05 -07:00
Kaixu Xia
afbd914776 xfs: remove the unused XFS_B_FSB_OFFSET macro
There are no callers of the XFS_B_FSB_OFFSET macro, so remove it.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
96f65bad7c xfs: enable the needsrepair feature
Make it so that libxfs recognizes the needsrepair feature.  Note that
the kernel will still refuse to mount these.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-12-09 09:49:38 -08:00
Darrick J. Wong
80c720b8eb xfs: define a new "needrepair" feature
Define an incompat feature flag to indicate that the filesystem needs to
be repaired.  While libxfs will recognize this feature, the kernel will
refuse to mount if the feature flag is set, and only xfs_repair will be
able to clear the flag.  The goal here is to force the admin to run
xfs_repair to completion after upgrading the filesystem, or if we
otherwise detect anomalies.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2020-12-09 09:48:13 -08:00
Darrick J. Wong
29887a2271 xfs: enable big timestamps
Enable the big timestamp feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:41 -07:00
Darrick J. Wong
4ea1ff3b49 xfs: widen ondisk quota expiration timestamps to handle y2038+
Enable the bigtime feature for quota timers.  We decrease the accuracy
of the timers to ~4s in exchange for being able to set timers up to the
bigtime maximum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:41 -07:00
Darrick J. Wong
f93e5436f0 xfs: widen ondisk inode timestamps to deal with y2038+
Redesign the ondisk inode timestamps to be a simple unsigned 64-bit
counter of nanoseconds since 14 Dec 1901 (i.e. the minimum time in the
32-bit unix time epoch).  This enables us to handle dates up to 2486,
which solves the y2038 problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:41 -07:00
Darrick J. Wong
5a0bb066f6 xfs: redefine xfs_timestamp_t
Redefine xfs_timestamp_t as a __be64 typedef in preparation for the
bigtime functionality.  Preserve the legacy structure format so that we
can let the compiler take care of masking and shifting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:41 -07:00
Darrick J. Wong
ccc8e771aa xfs: refactor default quota grace period setting code
Refactor the code that sets the default quota grace period into a helper
function so that we can override the ondisk behavior later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:40 -07:00
Darrick J. Wong
11d8a91902 xfs: refactor quota expiration timer modification
Define explicit limits on the range of quota grace period expiration
timeouts and refactor the code that modifies the timeouts into helpers
that clamp the values appropriately.  Note that we'll refactor the
default grace period timer separately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:40 -07:00
Darrick J. Wong
876fdc7c4f xfs: explicitly define inode timestamp range
Formally define the inode timestamp ranges that existing filesystems
support, and switch the vfs timetamp ranges to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-09-15 20:52:40 -07:00
Darrick J. Wong
b896a39faa xfs: enable new inode btree counters feature
Enable the new inode btree counters feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-09-15 20:52:40 -07:00
Darrick J. Wong
2a39946c98 xfs: store inode btree block counts in AGI header
Add a btree block usage counters for both inode btrees to the AGI header
so that we don't have to walk the entire finobt at mount time to create
the per-AG reservations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-09-15 20:52:39 -07:00
Darrick J. Wong
d8c1af0d6a xfs: rename the ondisk dquot d_flags to d_type
The ondisk dquot stores the quota record type in the flags field.
Rename this field to d_type to make the _type relationship between the
ondisk and incore dquot more obvious.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
a990f7a84e xfs: improve ondisk dquot flags checking
Create an XFS_DQTYPE_ANY mask for ondisk dquots flags, and use that to
ensure that we never accept any garbage flags when we're loading dquots.
While we're at it, restructure the quota type flag checking to use the
proper masking.

Note that I plan to add y2038 support soon, which will require a new
xfs_dqtype_t flag for extended timestamp support, hence all the work to
make the type masking work correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
1a7ed27165 xfs: create xfs_dqtype_t to represent quota types
Create a new type (xfs_dqtype_t) to represent the type of an incore
dquot (user, group, project, or none).  Rename the incore dquot's
dq_flags field to q_type.

This allows us to replace all the "uint type" arguments to the quota
functions with "xfs_dqtype_t type", to make it obvious when we're
passing a quota type argument into a function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
8cd4901da5 xfs: rename XFS_DQ_{USER,GROUP,PROJ} to XFS_DQTYPE_*
We're going to split up the incore dquot state flags from the ondisk
dquot flags (eventually renaming this "type") so start by renaming the
three flags and the bitmask that are going to participate in this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
51dbb1be52 xfs: remove qcore from incore dquots
Now that we've stopped using qcore entirely, drop it from the incore
dquot.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-07-28 20:24:14 -07:00
Darrick J. Wong
cb64e12993 xfs: make XFS_DQUOT_CLUSTER_SIZE_FSB part of the ondisk format
Move the dquot cluster size #define to xfs_format.h.  It is an important
part of the ondisk format because the ondisk dquot record size is not an
even power of two, which means that the buffer size we use is
significant here because the kernel leaves slack space at the end of the
buffer to avoid having to deal with a dquot record crossing a block
boundary.

This is also an excuse to fix one of the longstanding discrepancies
between kernel and userspace libxfs headers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2020-07-28 20:24:14 -07:00
Christoph Hellwig
09c38edd54 xfs: remove the XFS_DFORK_Q macro
Just checking di_forkoff directly is a little easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Nishad Kamdar
508578f2f5 xfs: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in header files
related to XFS File System support. For C header files
Documentation/process/license-rules.rst mandates C-like comments.
(opposed to C source files where C++ style should be used).

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-13 15:32:45 -07:00
Gustavo A. R. Silva
ee4064e56c xfs: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-13 15:32:45 -07:00
Christoph Hellwig
e9e2eae89d xfs: only check the superblock version for dinode size calculation
The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Christoph Hellwig
b81b79f4ed xfs: add a new xfs_sb_version_has_v3inode helper
Add a new wrapper to check if a file system supports the v3 inode format
with a larger dinode core.  Previously we used xfs_sb_version_hascrc for
that, which is technically correct but a little confusing to read.

Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode
so that we have one place that documents the superblock version to
inode version relationship.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:47:34 -07:00
Christoph Hellwig
3e6e8afd3a xfs: remove XFS_BUF_TO_SBP
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:39 -07:00
Christoph Hellwig
9798f615ad xfs: remove XFS_BUF_TO_AGF
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:39 -07:00
Christoph Hellwig
370c782b98 xfs: remove XFS_BUF_TO_AGI
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Christoph Hellwig
4b97510859 xfs: remove the xfs_agfl_t typedef
There is just a single user left, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Christoph Hellwig
183606d824 xfs: remove the agfl_bno member from struct xfs_agfl
struct xfs_agfl is a header in front of the AGFL entries that exists
for CRC enabled file systems.  For not CRC enabled file systems the AGFL
is simply a list of agbno.  Make the CRC case similar to that by just
using the list behind the new header.  This indirectly solves a problem
with modern gcc versions that warn about taking addresses of packed
structures (and we have to pack the AGFL given that gcc rounds up
structure sizes).  Also replace the helper macro to get from a buffer
with an inline function in xfs_alloc.h to make the code easier to
read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Darrick J. Wong
a508486552 xfs: introduce XFS_MAX_FILEOFF
Introduce a new #define for the maximum supported file block offset.
We'll use this in the next patch to make it more obvious that we're
doing some operation for all possible inode fork mappings after a given
offset.  We can't use ULLONG_MAX here because bunmapi uses that to
detect when it's done.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-14 08:02:51 -08:00
Eric Sandeen
35dab307c8 xfs: remove unused typedef definitions
Remove some typdefs for type_t's that are no longer referred to
by their typedef'd types.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 18:22:40 -08:00
Pavel Reichl
aefe69a45d xfs: remove the xfs_disk_dquot_t and xfs_dquot_t
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix some of the comments]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Darrick J. Wong
5467b34bd1 xfs: move xfs_ino_geometry to xfs_shared.h
The inode geometry structure isn't related to ondisk format; it's
support for the mount structure.  Move it to xfs_shared.h.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-06-28 19:25:35 -07:00
Darrick J. Wong
490d451fa5 xfs: fix inode_cluster_size rounding mayhem
inode_cluster_size is supposed to represent the size (in bytes) of an
inode cluster buffer.  We avoid having to handle multiple clusters per
filesystem block on filesystems with large blocks by openly rounding
this value up to 1 FSB when necessary.  However, we never reset
inode_cluster_size to reflect this new rounded value, which adds to the
potential for mistakes in calculating geometries.

Fix this by setting inode_cluster_size to reflect the rounded-up size if
needed, and special-case the few places in the sparse inodes code where
we actually need the smaller value to validate on-disk metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-06-12 08:37:40 -07:00
Darrick J. Wong
ef32595999 xfs: separate inode geometry
Separate the inode geometry information into a distinct structure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-06-12 08:37:40 -07:00
Darrick J. Wong
0357d21a6c xfs: move XFS_INODE_FORMAT_STR mappings to libxfs
Move XFS_INODE_FORMAT_STR to libxfs so that we don't forget to keep it
updated, and add necessary TRACE_DEFINE_ENUM.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-12-19 14:02:01 -08:00
Darrick J. Wong
43004b2a8d xfs: add a block to inode count converter
Add new helpers to convert units of fs blocks into inodes, and AG blocks
into AG inodes, respectively.  Convert all the open-coded conversions
and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate.  The
OFFBNO_TO_AGINO macro is retained for xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-12-12 08:47:16 -08:00
Christoph Hellwig
daa79baefc xfs: remove suport for filesystems without unwritten extent flag
The option to enable unwritten extents was made default in 2003,
removed from mkfs in 2007, and cannot be disabled in v5.  We also
rely on it for a lot of common functionality, so filesystems without
it will run a completely untested and buggy code path.  Enabling the
support also is a simple bit flip using xfs_db, so legacy file
systems can still be brought forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:18:58 +11:00
Eric Sandeen
f369a13cea xfs: don't treat unknown di_flags2 as corruption in scrub
xchk_inode_flags2() currently treats any di_flags2 values that the
running kernel doesn't recognize as corruption, and calls
xchk_ino_set_corrupt() if they are set.  However, it's entirely possible
that these flags were set in some newer kernel and are quite valid,
but ignored in this kernel.

(Validators don't care one bit about unknown di_flags2.)

Call xchk_ino_set_warning instead, because this may or may not actually
indicate a problem.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-09-29 13:49:00 +10:00
Darrick J. Wong
f62cb48e43 xfs: don't allow insert-range to shift extents past the maximum offset
Zorro Lang reports that generic/485 blows an assert on a filesystem with
512 byte blocks.  The test tries to fallocate a post-eof extent at the
maximum file size and calls insert range to shift the extents right by
two blocks.  On a 512b block filesystem this causes startoff to overflow
the 54-bit startoff field, leading to the assert.

Therefore, always check the rightmost extent to see if it would overflow
prior to invoking the insert range machinery.

Reported-by: zlang@redhat.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-06-24 11:56:36 -07:00
Dave Chinner
23fcb3340d xfs: More robust inode extent count validation
When the inode is in extent format, it can't have more extents that
fit in the inode fork. We don't currenty check this, and so this
corruption goes unnoticed by the inode verifiers. This can lead to
crashes operating on invalid in-memory structures.

Attempts to access such a inode will now error out in the verifier
rather than allowing modification operations to proceed.

Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a typedef, add some braces and breaks to shut up compiler warnings]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-21 23:25:57 -07:00
Dave Chinner
0b61f8a407 xfs: convert to SPDX license tags
Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
	echo $f
	cat $f | awk -f hdr.awk > $f.new
	mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
	hdr = 1.0
	tag = "GPL-2.0"
	str = ""
}

/^ \* This program is free software/ {
	hdr = 2.0;
	next
}

/any later version./ {
	tag = "GPL-2.0+"
	next
}

/^ \*\// {
	if (hdr > 0.0) {
		print "// SPDX-License-Identifier: " tag
		print str
		print $0
		str=""
		hdr = 0.0
		next
	}
	print $0
	next
}

/^ \* / {
	if (hdr > 1.0)
		next
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
	next
}

/^ \*/ {
	if (hdr > 0.0)
		next
	print $0
	next
}

// {
	if (hdr > 0.0) {
		if (str != "")
			str = str "\n"
		str = str $0
		next
	}
	print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-06 14:17:53 -07:00
Eric Sandeen
f7664b3197 xfs: implement online get/set fs label
The GET ioctl is trivial, just return the current label.

The SET ioctl is more involved:
It transactionally modifies the superblock to write a new filesystem
label to the primary super.

A new variant of xfs_sync_sb then writes the superblock buffer
immediately to disk so that the change is visible from userspace.

It then invalidates any page cache that userspace might have previously
read on the block device so that i.e. blkid can see the change
immediately, and updates all secondary superblocks as userspace relable
does.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[darrick: use dchinner's new xfs_update_secondary_sbs function]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16 08:50:16 -07:00
Dave Chinner
a78ee256c3 xfs: convert XFS_AGFL_SIZE to a helper function
The AGFL size calculation is about to get more complex, so lets turn
the macro into a function first and remove the macro.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick: forward port to newer kernel, simplify the helper]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-11 20:27:56 -07:00
Darrick J. Wong
65a7935ddc xfs: remove u_int* type usage
Use the uint* types instead of the u_int* types.  This will (hopefully)
pair with an xfsprogs cleanup.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-11-09 15:50:29 -08:00
Christoph Hellwig
866d7826c9 xfs: move xfs_bmbt_irec and xfs_exntst_t to xfs_types.h
Neither defines an on-disk format, so move them out of xfs_format.h.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-06 11:53:41 -08:00
Christoph Hellwig
6bdcf26ade xfs: use a b+tree for the in-core extent list
Replace the current linear list and the indirection array for the in-core
extent list with a b+tree to avoid the need for larger memory allocations
for the indirection array when lots of extents are present.  The current
extent list implementations leads to heavy pressure on the memory
allocator when modifying files with a high extent count, and can lead
to high latencies because of that.

The replacement is a b+tree with a few quirks.  The leaf nodes directly
store the extent record in two u64 values.  The encoding is a little bit
different from the existing in-core extent records so that the start
offset and length which are required for lookups can be retreived with
simple mask operations.  The inner nodes store a 64-bit key containing
the start offset in the first half of the node, and the pointers to the
next lower level in the second half.  In either case we walk the node
from the beginninig to the end and do a linear search, as that is more
efficient for the low number of cache lines touched during a search
(2 for the inner nodes, 4 for the leaf nodes) than a binary search.
We store termination markers (zero length for the leaf nodes, an
otherwise impossible high bit for the inner nodes) to terminate the key
list / records instead of storing a count to use the available cache
lines as efficiently as possible.

One quirk of the algorithm is that while we normally split a node half and
half like usual btree implementations we just spill over entries added at
the very end of the list to a new node on its own.  This means we get a
100% fill grade for the common cases of bulk insertion when reading an
inode into memory, and when only sequentially appending to a file.  The
downside is a slightly higher chance of splits on the first random
insertions.

Both insert and removal manually recurse into the lower levels, but
the bulk deletion of the whole tree is still implemented as a recursive
function call, although one limited by the overall depth and with very
little stack usage in every iteration.

For the first few extents we dynamically grow the list from a single
extent to the next powers of two until we have a first full leaf block
and that building the actual tree.

The code started out based on the generic lib/btree.c code from Joern
Engel based on earlier work from Peter Zijlstra, but has since been
rewritten beyond recognition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-06 11:53:41 -08:00
Dave Chinner
5d0eda0307 xfs: convert remaining xfs_sb_version_... checks to bool
Some were missed in the pass that converted the function return
values from int to bool. Update the remaining ones for consistency.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-01 15:03:16 -07:00
Christoph Hellwig
42b67dc6ff xfs: remove the never fully implemented UUID fork format
Remove the dead code dealing with the UUID fork format that was never
implemented in Linux (and neither in IRIX as far as I know).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-10-26 15:38:27 -07:00
Darrick J. Wong
29b0767b8b xfs: scrub realtime bitmap/summary
Perform simple tests of the realtime bitmap and summary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-10-26 15:38:26 -07:00
Darrick J. Wong
3daa664191 xfs: scrub inode btrees
Check the records of the inode btrees to make sure that the values
make sense given the inode records themselves.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-10-26 15:38:25 -07:00
Darrick J. Wong
6eb0b8df9f xfs: rename MAXPATHLEN to XFS_SYMLINK_MAXLEN
XFS has a maximum symlink target length of 1024 bytes; this is a
holdover from the Irix days.  Unfortunately, the constant establishing
this is 'MAXPATHLEN' and is /not/ the same as the Linux MAXPATHLEN,
which is 4096.

The kernel enforces its 1024 byte MAXPATHLEN on symlink targets, but
xfsprogs picks up the (Linux) system 4096 byte MAXPATHLEN, which means
that xfs_repair doesn't complain about oversized symlinks.

Since this is an on-disk format constraint, put the define in the XFS
namespace and move everything over to use the new name.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-07-07 08:37:26 -07:00
Darrick J. Wong
c8ce540db5 xfs: remove double-underscore integer types
This is a purely mechanical patch that removes the private
__{u,}int{8,16,32,64}_t typedefs in favor of using the system
{u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
the transformation and fix the resulting whitespace and indentation
errors:

s/typedef\t__uint8_t/typedef __uint8_t\t/g
s/typedef\t__uint/typedef __uint/g
s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
s/__uint8_t\t/__uint8_t\t\t/g
s/__uint/uint/g
s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
s/__int/int/g
/^typedef.*int[0-9]*_t;$/d

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-06-19 14:11:33 -07:00
Christoph Hellwig
0c1d9e4a61 xfs: simplify validation of the unwritten extent bit
XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.

We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.

Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.

This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format.  The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
37f7f9bbf3 xfs: remove unused values from xfs_exntst_t
We only ever use the normal and unwritten states.  And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
895e9bfc9e xfs: remove the unused XFS_MAXLINK_1 define
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Roger Willcocks
8cdcc8102c libxfs: v3 inodes are only valid on crc-enabled filesystems
xfs_repair was not detecting that version 3 inodes are invalid for
for non-CRC filesystems. The result is specific inode corruptions go
undetected and hence aren't repaired if only the version number is
out of range.

The core of the problem is that the XFS_DINODE_GOOD_VERSION() macro
doesn't know that valid inode versions are dependent on a superblock
version number. Fix this in libxfs, and propagate the new function
out into the rest of xfsprogs to fix the issue.

[Darrick: port to kernel from xfsprogs]

Reported-by: Leslie Rhorer <lrhorer@mygrande.net>
Signed-off-by: Roger Willcocks <roger@filmlight.ltd.uk>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20 15:48:38 +11:00
Darrick J. Wong
e54b5bf9d7 xfs: recognize the reflink feature bit
Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05 16:26:31 -07:00
Darrick J. Wong
f7ca352272 xfs: create a separate cow extent size hint for the allocator
Create a per-inode extent size allocator hint for copy-on-write.  This
hint is separate from the existing extent size hint so that CoW can
take advantage of the fragmentation-reducing properties of extent size
hints without disabling delalloc for regular writes.

The extent size hint that's fed to the allocator during a copy on
write operation is the greater of the cowextsize and regular extsize
hint.

During reflink, if we're sharing the entire source file to the entire
destination file and the destination file doesn't already have a
cowextsize hint, propagate the source file's cowextsize hint to the
destination file.

Furthermore, zero the bulkstat buffer prior to setting the fields
so that we don't copy kernel memory contents into userspace.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05 16:26:26 -07:00
Darrick J. Wong
174edb0e46 xfs: store in-progress CoW allocations in the refcount btree
Due to the way the CoW algorithm in XFS works, there's an interval
during which blocks allocated to handle a CoW can be lost -- if the FS
goes down after the blocks are allocated but before the block
remapping takes place.  This is exacerbated by the cowextsz hint --
allocated reservations can sit around for a while, waiting to get
used.

Since the refcount btree doesn't normally store records with refcount
of 1, we can use it to record these in-progress extents.  In-progress
blocks cannot be shared because they're not user-visible, so there
shouldn't be any conflicts with other programs.  This is a better
solution than holding EFIs during writeback because (a) EFIs can't be
relogged currently, (b) even if they could, EFIs are bound by
available log space, which puts an unnecessary upper bound on how much
CoW we can have in flight, and (c) we already have a mechanism to
track blocks.

At mount time, read the refcount records and free anything we find
with a refcount of 1 because those were in-progress when the FS went
down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05 16:26:05 -07:00
Darrick J. Wong
bdf28630b7 xfs: add refcount btree operations
Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-10-03 09:11:19 -07:00
Darrick J. Wong
1946b91cee xfs: define the on-disk refcount btree format
Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03 09:11:18 -07:00
Darrick J. Wong
af30dfa144 xfs: refcount btree add more reserved blocks
Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03 09:11:17 -07:00
Darrick J. Wong
46eeb521b9 xfs: introduce refcount btree definitions
Add new per-AG refcount btree definitions to the per-AG structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-03 09:11:16 -07:00
Darrick J. Wong
da1f039d69 xfs: don't log the entire end of the AGF
When we're logging the last non-spare field in the AGF, we don't
need to log the spare fields, so plumb in a new AGF logging flag
to help us avoid that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26 15:59:31 +10:00
Darrick J. Wong
f32866fdc9 xfs: store rmapbt block count in the AGF
Track the number of blocks used for the rmapbt in the AGF.  When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17 08:31:49 +10:00
Darrick J. Wong
1c0607ace9 xfs: enable the rmap btree functionality
Originally-From: Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

Signed-off-by: Dave Chinner <dchinner@redhat.com>
[darrick.wong@oracle.com: move the experimental tag]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 12:20:57 +10:00
Darrick J. Wong
035e00acb5 xfs: define the on-disk rmap btree format
Originally-From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start filling out the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

[darrick: record owner and offset info in rmap btree]
[darrick: fork, bmbt and unwritten state in rmap btree]
[darrick: flags are a separate field in xfs_rmap_irec]
[darrick: calculate maxlevels separately]
[darrick: move the 'unwritten' bit into unused parts of rm_offset]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:36:07 +10:00
Darrick J. Wong
340785cca1 xfs: add owner field to extent allocation and freeing
For the rmap btree to work, we have to feed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Extend the owner field to include both the owner type and some sort
of index within the owner.  The index field will be used to support
reverse mappings when reflink is enabled.

When we're freeing extents from an EFI, we don't have the owner
information available (rmap updates have their own redo items).
xfs_free_extent therefore doesn't need to do an rmap update. Make
sure that the log replay code signals this correctly.

This is based upon a patch originally from Dave Chinner. It has been
extended to add more owner information with the intent of helping
recovery operations when things go wrong (e.g. offset of user data
block in a file).

[dchinner: de-shout the xfs_rmap_*_owner helpers]
[darrick: minor style fixes suggested by Christoph Hellwig]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:33:42 +10:00
Darrick J. Wong
8018026ef2 xfs: rmap btree add more reserved blocks
Originally-From: Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:31:47 +10:00
Darrick J. Wong
b87049444a xfs: introduce rmap btree definitions
Originally-From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit in the empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-03 11:30:32 +10:00
Hou Tao
ad70328a50 xfs: remove the magic numbers in xfs_btree_block-related len macros
replace the magic numbers by offsetof(...) and sizeof(...), and add two
extra checks on xfs_check_ondisk_structs()

[dchinner: renamed header structures to be more descriptive]

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-07-20 10:43:11 +10:00
Dave Chinner
4c931f770d Merge branch 'xfs-setxattr-promotion' into for-next 2016-01-19 08:16:08 +11:00
Dave Chinner
58f88ca2df xfs: introduce per-inode DAX enablement
Rather than just being able to turn DAX on and off via a mount
option, some applications may only want to enable DAX for certain
performance critical files in a filesystem.

This patch introduces a new inode flag to enable DAX in the v3 inode
di_flags2 field. It adds support for setting and clearing flags in
the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
S_DAX inode flag appropriately when it is seen.

When this flag is set on a directory, it acts as an "inherit flag".
That is, inodes created in the directory will automatically inherit
the on-disk inode DAX flag, enabling administrators to set up
directory heirarchies that automatically use DAX. Setting this flag
on an empty root directory will make the entire filesystem use DAX
by default.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
2016-01-04 16:44:15 +11:00
Dave Chinner
e7b8948101 xfs: use FS_XFLAG definitions directly
Now that the ioctls have been hoisted up to the VFS level, use
the VFs definitions directly and remove the XFS specific definitions
completely. Userspace is going to have to handle the change of this
interface separately, so removing the definitions from xfs_fs.h is
not an issue here at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
2016-01-04 16:44:15 +11:00
Darrick J. Wong
96f859d52b libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
inside it, gcc will quietly round the structure size up to the nearest
64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
macro returning incorrect results for v5 filesystems on 64-bit
machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
will see garbage in AGFL item 119 and complain.

Therefore, tell gcc not to pad the structure so that the AGFL size
calculation is correct.

cc: <stable@vger.kernel.org> # 3.10 - 4.4
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-01-04 16:13:21 +11:00
Dave Chinner
2da5c4b05a Merge branch 'xfs-misc-fixes-for-4.4-2' into for-next 2015-11-03 13:27:58 +11:00
Andreas Gruenbacher
86a21c7974 xfs: Validate the length of on-disk ACLs
In xfs_acl_from_disk, instead of trusting that xfs_acl.acl_cnt is correct,
make sure that the length of the attributes is correct as well.  Also, turn
the aclp parameter into a const pointer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-11-03 12:41:59 +11:00
Jan Tulak
51fcbfe709 xfs: avoid dependency on Linux XATTR_SIZE_MAX
Currently, we depends on Linux XATTR value for on disk
definition. Which causes trouble on other platforms and
maybe also if this value was to change.

Fix it by creating a custom definition independent from
those in Linux (although with the same values), so it is OK
with the be16 fields used for holding these attributes.

This patch reflects a change in xfsprogs.

Signed-off-by: Jan Tulak <jtulak@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-10-12 16:03:59 +11:00
Eric Sandeen
ce748eaa65 xfs: create new metadata UUID field and incompat flag
This adds a new superblock field, sb_meta_uuid.  If set, along with
a new incompat flag, the code will use that field on a V5 filesystem
to compare to metadata UUIDs, which allows us to change the user-
visible UUID at will.  Userspace handles the setting and clearing
of the incompat flag as appropriate, as the UUID gets changed; i.e.
setting the user-visible UUID back to the original UUID (as stored in
the new field) will remove the incompatible feature flag.

If the incompat flag is not set, this copies the user-visible UUID into
into the meta_uuid slot in memory when the superblock is read from disk;
the meta_uuid field is not written back to disk in this case.

The remainder of this patch simply switches verifiers, initializers,
etc to use the new sb_meta_uuid field.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-07-29 11:53:31 +10:00
Dave Chinner
3d238b7e0e Merge branch 'xfs-freelist-cleanup' into for-next 2015-06-23 08:48:43 +10:00
Dave Chinner
496817b4be xfs: clean up XFS_MIN_FREELIST macros
We no longer calculate the minimum freelist size from the on-disk
AGF, so we don't need the macros used for this. That means the
nested macros can be cleaned up, and turn this into an actual
function so the logic is clear and concise. This will make it much
easier to add support for the rmap btree when the time comes.

This also gets rid of the XFS_AG_MAXLEVELS macro used by these
freelist macros as it is simply a wrapper around a single variable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-22 10:13:30 +10:00
Dave Chinner
b9a350a118 Merge branch 'xfs-sparse-inode' into for-next 2015-06-01 10:51:38 +10:00
Eric Sandeen
39e56d9219 xfs: don't cast string literals
The commit:

a9273ca5 xfs: convert attr to use unsigned names

added these (unsigned char *) casts, but then the _SIZE macros
return "7" - size of a pointer minus one - not the length of
the string.  This is harmless in the kernel, because the _SIZE
macros are not used, but as we sync up with userspace, this will
matter.

I don't think the cast is necessary; i.e. assigning the string
literal to an unsigned char *, or passing it to a function
expecting an unsigned char *, should be ok, right?

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-06-01 07:15:38 +10:00
Brian Foster
22ce1e1472 xfs: enable sparse inode chunks for v5 superblocks
Enable mounting of filesystems with sparse inode support enabled. Add
the incompat. feature bit to the *_ALL mask.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 09:26:33 +10:00
Brian Foster
5419040fc0 xfs: introduce inode record hole mask for sparse inode chunks
The inode btrees track 64 inodes per record regardless of inode size.
Thus, inode chunks on disk vary in size depending on the size of the
inodes. This creates a contiguous allocation requirement for new inode
chunks that can be difficult to satisfy on an aged and fragmented (free
space) filesystems.

The inode record freecount currently uses 4 bytes on disk to track the
free inode count. With a maximum freecount value of 64, only one byte is
required. Convert the freecount field to a single byte and use two of
the remaining 3 higher order bytes left for the hole mask field. Use the
final leftover byte for the total count field.

The hole mask field tracks holes in the chunks of physical space that
the inode record refers to. This facilitates the sparse allocation of
inode chunks when contiguous chunks are not available and allows the
inode btrees to identify what portions of the chunk contain valid
inodes. The total count field contains the total number of valid inodes
referred to by the record. This can also be deduced from the hole mask.
The count field provides clarity and redundancy for internal record
verification.

Note that neither of the new fields can be written to disk on fs'
without sparse inode support. Doing so writes to the high-order bytes of
freecount and causes corruption from the perspective of older kernels.
The on-disk inobt record data structure is updated with a union to
distinguish between the original, "full" format and the new, "sparse"
format. The conversion routines to get, insert and update records are
updated to translate to and from the on-disk record accordingly such
that freecount remains a 4-byte value on non-supported fs, yet the new
fields of the in-core record are always valid with respect to the
record. This means that higher level code can refer to the current
in-core record format unconditionally and lower level code ensures that
records are translated to/from disk according to the capabilities of the
fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 09:03:04 +10:00
Brian Foster
e5376fc15a xfs: sparse inode chunks feature helpers and mount requirements
The sparse inode chunks feature uses the helper function to enable the
allocation of sparse inode chunks. The incompatible feature bit is set
on disk at mkfs time to prevent mount from unsupported kernels.

Also, enforce the inode alignment requirements required for sparse inode
chunks at mount time. When enabled, full inode chunks (and all inode
record) alignment is increased from cluster size to inode chunk size.
Sparse inode alignment must match the cluster size of the fs. Both
superblock alignment fields are set as such by mkfs when sparse inode
support is enabled.

Finally, warn that sparse inode chunks is an experimental feature until
further notice.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29 08:57:27 +10:00
Brian Foster
fb4f2b4e5a xfs: add sparse inode chunk alignment superblock field
Add sb_spino_align to the superblock to specify sparse inode chunk
alignment. This also currently represents the minimum allowable sparse
chunk allocation size.

Signed-off-by: Brian Foster <bfoster@redhat.com>
2015-05-29 08:54:03 +10:00
Dave Chinner
964aa8d9e4 xfs: remove xfs_mod_incore_sb API
Now that there are no users of the bitfield based incore superblock
modification API, just remove the whole damn lot of it, including
all the bitfield definitions. This finally removes a lot of cruft
that has been around for a long time.

Credit goes to Christoph Hellwig for providing a great patch
connecting all the dots to enale us to do this. This patch is
derived from that work.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-02-23 21:24:37 +11:00
Dave Chinner
465e2def7c Merge branch 'xfs-sb-logging-rework' into for-next
Conflicts:
	fs/xfs/xfs_mount.c
2015-01-22 09:20:53 +11:00
Dave Chinner
074e427ba7 xfs: sanitise sb_bad_features2 handling
We currently have to ensure that every time we update sb_features2
that we update sb_bad_features2. Now that we log and format the
superblock in it's entirety we actually don't have to care because
we can simply update the sb_bad_features2 when we format it into the
buffer. This removes the need for anything but the mount and
superblock formatting code to care about sb_bad_features2, and
hence removes the possibility that we forget to update bad_features2
when necessary in the future.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-01-22 09:10:33 +11:00
Jan Kara
1a43ec03dd xfs: Keep sb_bad_features2 consistent with sb_features2
Currently when we modify sb_features2, we store the same value also in
sb_bad_features2. However in most places we forget to mark field
sb_bad_features2 for logging and thus it can happen that a change to it
is lost. This results in an inconsistent sb_features2 and
sb_bad_features2 fields e.g. after xfstests test xfs/187.

Fix the problem by changing XFS_SB_FEATURES2 to actually mean both
sb_features2 and sb_bad_features2 fields since this is always what we
want to log. This isn't ideal because the fact that XFS_SB_FEATURES2
means two fields could cause some problem in future however the code is
hopefully less error prone that it is now.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-12-24 09:48:35 +11:00
Christoph Hellwig
508b6b3b73 xfs: merge xfs_inum.h into xfs_format.h
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28 14:27:10 +11:00
Christoph Hellwig
bb58e6188a xfs: move most of xfs_sb.h to xfs_format.h
More on-disk format consolidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-11-28 14:27:09 +11:00