mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	 70cb0d02b5
			
		
	
	
		70cb0d02b5
		
	
	
	
	
		
			
			about the state of the extent status cache. Dropped workaround for pre-1970 dates which were encoded incorrectly in pre-4.4 kernels. Since both the kernel correctly generates, and e2fsck detects and fixes this issue for the past four years, it'e time to drop the workaround. (Also, it's not like files with dates in the distant past were all that common in the first place.) A lot of miscellaneous bug fixes and cleanups, including some ext4 Documentation fixes. Also included are two minor bug fixes in fs/unicode. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl2D5ZIACgkQ8vlZVpUN gaO8NQf+ONLK5nu8KUk14uh8MOXMisiT+g1iqhynZcqtuZzTr4nKqUbHLmPDHrCC RiD/gkLhp6u+UlzYRJq6nudunid1be2/1bjoUm6lddE4XLtbeGHhZsGn1+9K/wy+ l8UFMXd8fCOlXNzajS85Hb0KSuzlrGYEjSrNecSa3KLxrv1kM1+FyKFcqQ7Ejs5/ VZYNtWo69R4wSEIawGkEZuNu/wFeLOzqJgxFJLo6zFxTAp449bbEduz12ssmkUhl QbXH9cXLR4pAZykzMRqHC8UFFTKmpLnc5EiT1Ajxzu4EAzB1SzqRJvbz/3CF3d/Z gBKDrDlasv75VJqVtqw4mCxmEoEYjw== =Iwrf -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Added new ext4 debugging ioctls to allow userspace to get information about the state of the extent status cache. Dropped workaround for pre-1970 dates which were encoded incorrectly in pre-4.4 kernels. Since both the kernel correctly generates, and e2fsck detects and fixes this issue for the past four years, it'e time to drop the workaround. (Also, it's not like files with dates in the distant past were all that common in the first place.) A lot of miscellaneous bug fixes and cleanups, including some ext4 Documentation fixes. Also included are two minor bug fixes in fs/unicode" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) unicode: make array 'token' static const, makes object smaller unicode: Move static keyword to the front of declarations ext4: add missing bigalloc documentation. ext4: fix kernel oops caused by spurious casefold flag ext4: fix integer overflow when calculating commit interval ext4: use percpu_counters for extent_status cache hits/misses ext4: fix potential use after free after remounting with noblock_validity jbd2: add missing tracepoint for reserved handle ext4: fix punch hole for inline_data file systems ext4: rework reserved cluster accounting when invalidating pages ext4: documentation fixes ext4: treat buffers with write errors as containing valid data ext4: fix warning inside ext4_convert_unwritten_extents_endio ext4: set error return correctly when ext4_htree_store_dirent fails ext4: drop legacy pre-1970 encoding workaround ext4: add new ioctl EXT4_IOC_GET_ES_CACHE ext4: add a new ioctl EXT4_IOC_GETSTATE ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE jbd2: flush_descriptor(): Do not decrease buffer head's ref count ext4: remove unnecessary error check ...
		
			
				
	
	
		
			579 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			579 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. SPDX-License-Identifier: GPL-2.0
 | |
| 
 | |
| Index Nodes
 | |
| -----------
 | |
| 
 | |
| In a regular UNIX filesystem, the inode stores all the metadata
 | |
| pertaining to the file (time stamps, block maps, extended attributes,
 | |
| etc), not the directory entry. To find the information associated with a
 | |
| file, one must traverse the directory files to find the directory entry
 | |
| associated with a file, then load the inode to find the metadata for
 | |
| that file. ext4 appears to cheat (for performance reasons) a little bit
 | |
| by storing a copy of the file type (normally stored in the inode) in the
 | |
| directory entry. (Compare all this to FAT, which stores all the file
 | |
| information directly in the directory entry, but does not support hard
 | |
| links and is in general more seek-happy than ext4 due to its simpler
 | |
| block allocator and extensive use of linked lists.)
 | |
| 
 | |
| The inode table is a linear array of ``struct ext4_inode``. The table is
 | |
| sized to have enough blocks to store at least
 | |
| ``sb.s_inode_size * sb.s_inodes_per_group`` bytes. The number of the
 | |
| block group containing an inode can be calculated as
 | |
| ``(inode_number - 1) / sb.s_inodes_per_group``, and the offset into the
 | |
| group's table is ``(inode_number - 1) % sb.s_inodes_per_group``. There
 | |
| is no inode 0.
 | |
| 
 | |
| The inode checksum is calculated against the FS UUID, the inode number,
 | |
| and the inode structure itself.
 | |
| 
 | |
| The inode table entry is laid out in ``struct ext4_inode``.
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
|    :class: longtable
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le16
 | |
|      - i\_mode
 | |
|      - File mode. See the table i_mode_ below.
 | |
|    * - 0x2
 | |
|      - \_\_le16
 | |
|      - i\_uid
 | |
|      - Lower 16-bits of Owner UID.
 | |
|    * - 0x4
 | |
|      - \_\_le32
 | |
|      - i\_size\_lo
 | |
|      - Lower 32-bits of size in bytes.
 | |
|    * - 0x8
 | |
|      - \_\_le32
 | |
|      - i\_atime
 | |
|      - Last access time, in seconds since the epoch. However, if the EA\_INODE
 | |
|        inode flag is set, this inode stores an extended attribute value and
 | |
|        this field contains the checksum of the value.
 | |
|    * - 0xC
 | |
|      - \_\_le32
 | |
|      - i\_ctime
 | |
|      - Last inode change time, in seconds since the epoch. However, if the
 | |
|        EA\_INODE inode flag is set, this inode stores an extended attribute
 | |
|        value and this field contains the lower 32 bits of the attribute value's
 | |
|        reference count.
 | |
|    * - 0x10
 | |
|      - \_\_le32
 | |
|      - i\_mtime
 | |
|      - Last data modification time, in seconds since the epoch. However, if the
 | |
|        EA\_INODE inode flag is set, this inode stores an extended attribute
 | |
|        value and this field contains the number of the inode that owns the
 | |
|        extended attribute.
 | |
|    * - 0x14
 | |
|      - \_\_le32
 | |
|      - i\_dtime
 | |
|      - Deletion Time, in seconds since the epoch.
 | |
|    * - 0x18
 | |
|      - \_\_le16
 | |
|      - i\_gid
 | |
|      - Lower 16-bits of GID.
 | |
|    * - 0x1A
 | |
|      - \_\_le16
 | |
|      - i\_links\_count
 | |
|      - Hard link count. Normally, ext4 does not permit an inode to have more
 | |
|        than 65,000 hard links. This applies to files as well as directories,
 | |
|        which means that there cannot be more than 64,998 subdirectories in a
 | |
|        directory (each subdirectory's '..' entry counts as a hard link, as does
 | |
|        the '.' entry in the directory itself). With the DIR\_NLINK feature
 | |
|        enabled, ext4 supports more than 64,998 subdirectories by setting this
 | |
|        field to 1 to indicate that the number of hard links is not known.
 | |
|    * - 0x1C
 | |
|      - \_\_le32
 | |
|      - i\_blocks\_lo
 | |
|      - Lower 32-bits of “block” count. If the huge\_file feature flag is not
 | |
|        set on the filesystem, the file consumes ``i_blocks_lo`` 512-byte blocks
 | |
|        on disk. If huge\_file is set and EXT4\_HUGE\_FILE\_FL is NOT set in
 | |
|        ``inode.i_flags``, then the file consumes ``i_blocks_lo + (i_blocks_hi
 | |
|        << 32)`` 512-byte blocks on disk. If huge\_file is set and
 | |
|        EXT4\_HUGE\_FILE\_FL IS set in ``inode.i_flags``, then this file
 | |
|        consumes (``i_blocks_lo + i_blocks_hi`` << 32) filesystem blocks on
 | |
|        disk.
 | |
|    * - 0x20
 | |
|      - \_\_le32
 | |
|      - i\_flags
 | |
|      - Inode flags. See the table i_flags_ below.
 | |
|    * - 0x24
 | |
|      - 4 bytes
 | |
|      - i\_osd1
 | |
|      - See the table i_osd1_ for more details.
 | |
|    * - 0x28
 | |
|      - 60 bytes
 | |
|      - i\_block[EXT4\_N\_BLOCKS=15]
 | |
|      - Block map or extent tree. See the section “The Contents of inode.i\_block”.
 | |
|    * - 0x64
 | |
|      - \_\_le32
 | |
|      - i\_generation
 | |
|      - File version (for NFS).
 | |
|    * - 0x68
 | |
|      - \_\_le32
 | |
|      - i\_file\_acl\_lo
 | |
|      - Lower 32-bits of extended attribute block. ACLs are of course one of
 | |
|        many possible extended attributes; I think the name of this field is a
 | |
|        result of the first use of extended attributes being for ACLs.
 | |
|    * - 0x6C
 | |
|      - \_\_le32
 | |
|      - i\_size\_high / i\_dir\_acl
 | |
|      - Upper 32-bits of file/directory size. In ext2/3 this field was named
 | |
|        i\_dir\_acl, though it was usually set to zero and never used.
 | |
|    * - 0x70
 | |
|      - \_\_le32
 | |
|      - i\_obso\_faddr
 | |
|      - (Obsolete) fragment address.
 | |
|    * - 0x74
 | |
|      - 12 bytes
 | |
|      - i\_osd2
 | |
|      - See the table i_osd2_ for more details.
 | |
|    * - 0x80
 | |
|      - \_\_le16
 | |
|      - i\_extra\_isize
 | |
|      - Size of this inode - 128. Alternately, the size of the extended inode
 | |
|        fields beyond the original ext2 inode, including this field.
 | |
|    * - 0x82
 | |
|      - \_\_le16
 | |
|      - i\_checksum\_hi
 | |
|      - Upper 16-bits of the inode checksum.
 | |
|    * - 0x84
 | |
|      - \_\_le32
 | |
|      - i\_ctime\_extra
 | |
|      - Extra change time bits. This provides sub-second precision. See Inode
 | |
|        Timestamps section.
 | |
|    * - 0x88
 | |
|      - \_\_le32
 | |
|      - i\_mtime\_extra
 | |
|      - Extra modification time bits. This provides sub-second precision.
 | |
|    * - 0x8C
 | |
|      - \_\_le32
 | |
|      - i\_atime\_extra
 | |
|      - Extra access time bits. This provides sub-second precision.
 | |
|    * - 0x90
 | |
|      - \_\_le32
 | |
|      - i\_crtime
 | |
|      - File creation time, in seconds since the epoch.
 | |
|    * - 0x94
 | |
|      - \_\_le32
 | |
|      - i\_crtime\_extra
 | |
|      - Extra file creation time bits. This provides sub-second precision.
 | |
|    * - 0x98
 | |
|      - \_\_le32
 | |
|      - i\_version\_hi
 | |
|      - Upper 32-bits for version number.
 | |
|    * - 0x9C
 | |
|      - \_\_le32
 | |
|      - i\_projid
 | |
|      - Project ID.
 | |
| 
 | |
| .. _i_mode:
 | |
| 
 | |
| The ``i_mode`` value is a combination of the following flags:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 16 64
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Value
 | |
|      - Description
 | |
|    * - 0x1
 | |
|      - S\_IXOTH (Others may execute)
 | |
|    * - 0x2
 | |
|      - S\_IWOTH (Others may write)
 | |
|    * - 0x4
 | |
|      - S\_IROTH (Others may read)
 | |
|    * - 0x8
 | |
|      - S\_IXGRP (Group members may execute)
 | |
|    * - 0x10
 | |
|      - S\_IWGRP (Group members may write)
 | |
|    * - 0x20
 | |
|      - S\_IRGRP (Group members may read)
 | |
|    * - 0x40
 | |
|      - S\_IXUSR (Owner may execute)
 | |
|    * - 0x80
 | |
|      - S\_IWUSR (Owner may write)
 | |
|    * - 0x100
 | |
|      - S\_IRUSR (Owner may read)
 | |
|    * - 0x200
 | |
|      - S\_ISVTX (Sticky bit)
 | |
|    * - 0x400
 | |
|      - S\_ISGID (Set GID)
 | |
|    * - 0x800
 | |
|      - S\_ISUID (Set UID)
 | |
|    * -
 | |
|      - These are mutually-exclusive file types:
 | |
|    * - 0x1000
 | |
|      - S\_IFIFO (FIFO)
 | |
|    * - 0x2000
 | |
|      - S\_IFCHR (Character device)
 | |
|    * - 0x4000
 | |
|      - S\_IFDIR (Directory)
 | |
|    * - 0x6000
 | |
|      - S\_IFBLK (Block device)
 | |
|    * - 0x8000
 | |
|      - S\_IFREG (Regular file)
 | |
|    * - 0xA000
 | |
|      - S\_IFLNK (Symbolic link)
 | |
|    * - 0xC000
 | |
|      - S\_IFSOCK (Socket)
 | |
| 
 | |
| .. _i_flags:
 | |
| 
 | |
| The ``i_flags`` field is a combination of these values:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 16 64
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Value
 | |
|      - Description
 | |
|    * - 0x1
 | |
|      - This file requires secure deletion (EXT4\_SECRM\_FL). (not implemented)
 | |
|    * - 0x2
 | |
|      - This file should be preserved, should undeletion be desired
 | |
|        (EXT4\_UNRM\_FL). (not implemented)
 | |
|    * - 0x4
 | |
|      - File is compressed (EXT4\_COMPR\_FL). (not really implemented)
 | |
|    * - 0x8
 | |
|      - All writes to the file must be synchronous (EXT4\_SYNC\_FL).
 | |
|    * - 0x10
 | |
|      - File is immutable (EXT4\_IMMUTABLE\_FL).
 | |
|    * - 0x20
 | |
|      - File can only be appended (EXT4\_APPEND\_FL).
 | |
|    * - 0x40
 | |
|      - The dump(1) utility should not dump this file (EXT4\_NODUMP\_FL).
 | |
|    * - 0x80
 | |
|      - Do not update access time (EXT4\_NOATIME\_FL).
 | |
|    * - 0x100
 | |
|      - Dirty compressed file (EXT4\_DIRTY\_FL). (not used)
 | |
|    * - 0x200
 | |
|      - File has one or more compressed clusters (EXT4\_COMPRBLK\_FL). (not used)
 | |
|    * - 0x400
 | |
|      - Do not compress file (EXT4\_NOCOMPR\_FL). (not used)
 | |
|    * - 0x800
 | |
|      - Encrypted inode (EXT4\_ENCRYPT\_FL). This bit value previously was
 | |
|        EXT4\_ECOMPR\_FL (compression error), which was never used.
 | |
|    * - 0x1000
 | |
|      - Directory has hashed indexes (EXT4\_INDEX\_FL).
 | |
|    * - 0x2000
 | |
|      - AFS magic directory (EXT4\_IMAGIC\_FL).
 | |
|    * - 0x4000
 | |
|      - File data must always be written through the journal
 | |
|        (EXT4\_JOURNAL\_DATA\_FL).
 | |
|    * - 0x8000
 | |
|      - File tail should not be merged (EXT4\_NOTAIL\_FL). (not used by ext4)
 | |
|    * - 0x10000
 | |
|      - All directory entry data should be written synchronously (see
 | |
|        ``dirsync``) (EXT4\_DIRSYNC\_FL).
 | |
|    * - 0x20000
 | |
|      - Top of directory hierarchy (EXT4\_TOPDIR\_FL).
 | |
|    * - 0x40000
 | |
|      - This is a huge file (EXT4\_HUGE\_FILE\_FL).
 | |
|    * - 0x80000
 | |
|      - Inode uses extents (EXT4\_EXTENTS\_FL).
 | |
|    * - 0x100000
 | |
|      - Verity protected file (EXT4\_VERITY\_FL).
 | |
|    * - 0x200000
 | |
|      - Inode stores a large extended attribute value in its data blocks
 | |
|        (EXT4\_EA\_INODE\_FL).
 | |
|    * - 0x400000
 | |
|      - This file has blocks allocated past EOF (EXT4\_EOFBLOCKS\_FL).
 | |
|        (deprecated)
 | |
|    * - 0x01000000
 | |
|      - Inode is a snapshot (``EXT4_SNAPFILE_FL``). (not in mainline)
 | |
|    * - 0x04000000
 | |
|      - Snapshot is being deleted (``EXT4_SNAPFILE_DELETED_FL``). (not in
 | |
|        mainline)
 | |
|    * - 0x08000000
 | |
|      - Snapshot shrink has completed (``EXT4_SNAPFILE_SHRUNK_FL``). (not in
 | |
|        mainline)
 | |
|    * - 0x10000000
 | |
|      - Inode has inline data (EXT4\_INLINE\_DATA\_FL).
 | |
|    * - 0x20000000
 | |
|      - Create children with the same project ID (EXT4\_PROJINHERIT\_FL).
 | |
|    * - 0x80000000
 | |
|      - Reserved for ext4 library (EXT4\_RESERVED\_FL).
 | |
|    * -
 | |
|      - Aggregate flags:
 | |
|    * - 0x705BDFFF
 | |
|      - User-visible flags.
 | |
|    * - 0x604BC0FF
 | |
|      - User-modifiable flags. Note that while EXT4\_JOURNAL\_DATA\_FL and
 | |
|        EXT4\_EXTENTS\_FL can be set with setattr, they are not in the kernel's
 | |
|        EXT4\_FL\_USER\_MODIFIABLE mask, since it needs to handle the setting of
 | |
|        these flags in a special manner and they are masked out of the set of
 | |
|        flags that are saved directly to i\_flags.
 | |
| 
 | |
| .. _i_osd1:
 | |
| 
 | |
| The ``osd1`` field has multiple meanings depending on the creator:
 | |
| 
 | |
| Linux:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le32
 | |
|      - l\_i\_version
 | |
|      - Inode version. However, if the EA\_INODE inode flag is set, this inode
 | |
|        stores an extended attribute value and this field contains the upper 32
 | |
|        bits of the attribute value's reference count.
 | |
| 
 | |
| Hurd:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le32
 | |
|      - h\_i\_translator
 | |
|      - ??
 | |
| 
 | |
| Masix:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le32
 | |
|      - m\_i\_reserved
 | |
|      - ??
 | |
| 
 | |
| .. _i_osd2:
 | |
| 
 | |
| The ``osd2`` field has multiple meanings depending on the filesystem creator:
 | |
| 
 | |
| Linux:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le16
 | |
|      - l\_i\_blocks\_high
 | |
|      - Upper 16-bits of the block count. Please see the note attached to
 | |
|        i\_blocks\_lo.
 | |
|    * - 0x2
 | |
|      - \_\_le16
 | |
|      - l\_i\_file\_acl\_high
 | |
|      - Upper 16-bits of the extended attribute block (historically, the file
 | |
|        ACL location). See the Extended Attributes section below.
 | |
|    * - 0x4
 | |
|      - \_\_le16
 | |
|      - l\_i\_uid\_high
 | |
|      - Upper 16-bits of the Owner UID.
 | |
|    * - 0x6
 | |
|      - \_\_le16
 | |
|      - l\_i\_gid\_high
 | |
|      - Upper 16-bits of the GID.
 | |
|    * - 0x8
 | |
|      - \_\_le16
 | |
|      - l\_i\_checksum\_lo
 | |
|      - Lower 16-bits of the inode checksum.
 | |
|    * - 0xA
 | |
|      - \_\_le16
 | |
|      - l\_i\_reserved
 | |
|      - Unused.
 | |
| 
 | |
| Hurd:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le16
 | |
|      - h\_i\_reserved1
 | |
|      - ??
 | |
|    * - 0x2
 | |
|      - \_\_u16
 | |
|      - h\_i\_mode\_high
 | |
|      - Upper 16-bits of the file mode.
 | |
|    * - 0x4
 | |
|      - \_\_le16
 | |
|      - h\_i\_uid\_high
 | |
|      - Upper 16-bits of the Owner UID.
 | |
|    * - 0x6
 | |
|      - \_\_le16
 | |
|      - h\_i\_gid\_high
 | |
|      - Upper 16-bits of the GID.
 | |
|    * - 0x8
 | |
|      - \_\_u32
 | |
|      - h\_i\_author
 | |
|      - Author code?
 | |
| 
 | |
| Masix:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 8 8 24 40
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Offset
 | |
|      - Size
 | |
|      - Name
 | |
|      - Description
 | |
|    * - 0x0
 | |
|      - \_\_le16
 | |
|      - h\_i\_reserved1
 | |
|      - ??
 | |
|    * - 0x2
 | |
|      - \_\_u16
 | |
|      - m\_i\_file\_acl\_high
 | |
|      - Upper 16-bits of the extended attribute block (historically, the file
 | |
|        ACL location).
 | |
|    * - 0x4
 | |
|      - \_\_u32
 | |
|      - m\_i\_reserved2[2]
 | |
|      - ??
 | |
| 
 | |
| Inode Size
 | |
| ~~~~~~~~~~
 | |
| 
 | |
| In ext2 and ext3, the inode structure size was fixed at 128 bytes
 | |
| (``EXT2_GOOD_OLD_INODE_SIZE``) and each inode had a disk record size of
 | |
| 128 bytes. Starting with ext4, it is possible to allocate a larger
 | |
| on-disk inode at format time for all inodes in the filesystem to provide
 | |
| space beyond the end of the original ext2 inode. The on-disk inode
 | |
| record size is recorded in the superblock as ``s_inode_size``. The
 | |
| number of bytes actually used by struct ext4\_inode beyond the original
 | |
| 128-byte ext2 inode is recorded in the ``i_extra_isize`` field for each
 | |
| inode, which allows struct ext4\_inode to grow for a new kernel without
 | |
| having to upgrade all of the on-disk inodes. Access to fields beyond
 | |
| EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within
 | |
| ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as
 | |
| of August 2019) the inode structure is 160 bytes
 | |
| (``i_extra_isize = 32``). The extra space between the end of the inode
 | |
| structure and the end of the inode record can be used to store extended
 | |
| attributes. Each inode record can be as large as the filesystem block
 | |
| size, though this is not terribly efficient.
 | |
| 
 | |
| Finding an Inode
 | |
| ~~~~~~~~~~~~~~~~
 | |
| 
 | |
| Each block group contains ``sb->s_inodes_per_group`` inodes. Because
 | |
| inode 0 is defined not to exist, this formula can be used to find the
 | |
| block group that an inode lives in:
 | |
| ``bg = (inode_num - 1) / sb->s_inodes_per_group``. The particular inode
 | |
| can be found within the block group's inode table at
 | |
| ``index = (inode_num - 1) % sb->s_inodes_per_group``. To get the byte
 | |
| address within the inode table, use
 | |
| ``offset = index * sb->s_inode_size``.
 | |
| 
 | |
| Inode Timestamps
 | |
| ~~~~~~~~~~~~~~~~
 | |
| 
 | |
| Four timestamps are recorded in the lower 128 bytes of the inode
 | |
| structure -- inode change time (ctime), access time (atime), data
 | |
| modification time (mtime), and deletion time (dtime). The four fields
 | |
| are 32-bit signed integers that represent seconds since the Unix epoch
 | |
| (1970-01-01 00:00:00 GMT), which means that the fields will overflow in
 | |
| January 2038. For inodes that are not linked from any directory but are
 | |
| still open (orphan inodes), the dtime field is overloaded for use with
 | |
| the orphan list. The superblock field ``s_last_orphan`` points to the
 | |
| first inode in the orphan list; dtime is then the number of the next
 | |
| orphaned inode, or zero if there are no more orphans.
 | |
| 
 | |
| If the inode structure size ``sb->s_inode_size`` is larger than 128
 | |
| bytes and the ``i_inode_extra`` field is large enough to encompass the
 | |
| respective ``i_[cma]time_extra`` field, the ctime, atime, and mtime
 | |
| inode fields are widened to 64 bits. Within this “extra” 32-bit field,
 | |
| the lower two bits are used to extend the 32-bit seconds field to be 34
 | |
| bit wide; the upper 30 bits are used to provide nanosecond timestamp
 | |
| accuracy. Therefore, timestamps should not overflow until May 2446.
 | |
| dtime was not widened. There is also a fifth timestamp to record inode
 | |
| creation time (crtime); this field is 64-bits wide and decoded in the
 | |
| same manner as 64-bit [cma]time. Neither crtime nor dtime are accessible
 | |
| through the regular stat() interface, though debugfs will report them.
 | |
| 
 | |
| We use the 32-bit signed time value plus (2^32 \* (extra epoch bits)).
 | |
| In other words:
 | |
| 
 | |
| .. list-table::
 | |
|    :widths: 20 20 20 20 20
 | |
|    :header-rows: 1
 | |
| 
 | |
|    * - Extra epoch bits
 | |
|      - MSB of 32-bit time
 | |
|      - Adjustment for signed 32-bit to 64-bit tv\_sec
 | |
|      - Decoded 64-bit tv\_sec
 | |
|      - valid time range
 | |
|    * - 0 0
 | |
|      - 1
 | |
|      - 0
 | |
|      - ``-0x80000000 - -0x00000001``
 | |
|      - 1901-12-13 to 1969-12-31
 | |
|    * - 0 0
 | |
|      - 0
 | |
|      - 0
 | |
|      - ``0x000000000 - 0x07fffffff``
 | |
|      - 1970-01-01 to 2038-01-19
 | |
|    * - 0 1
 | |
|      - 1
 | |
|      - 0x100000000
 | |
|      - ``0x080000000 - 0x0ffffffff``
 | |
|      - 2038-01-19 to 2106-02-07
 | |
|    * - 0 1
 | |
|      - 0
 | |
|      - 0x100000000
 | |
|      - ``0x100000000 - 0x17fffffff``
 | |
|      - 2106-02-07 to 2174-02-25
 | |
|    * - 1 0
 | |
|      - 1
 | |
|      - 0x200000000
 | |
|      - ``0x180000000 - 0x1ffffffff``
 | |
|      - 2174-02-25 to 2242-03-16
 | |
|    * - 1 0
 | |
|      - 0
 | |
|      - 0x200000000
 | |
|      - ``0x200000000 - 0x27fffffff``
 | |
|      - 2242-03-16 to 2310-04-04
 | |
|    * - 1 1
 | |
|      - 1
 | |
|      - 0x300000000
 | |
|      - ``0x280000000 - 0x2ffffffff``
 | |
|      - 2310-04-04 to 2378-04-22
 | |
|    * - 1 1
 | |
|      - 0
 | |
|      - 0x300000000
 | |
|      - ``0x300000000 - 0x37fffffff``
 | |
|      - 2378-04-22 to 2446-05-10
 | |
| 
 | |
| This is a somewhat odd encoding since there are effectively seven times
 | |
| as many positive values as negative values. There have also been
 | |
| long-standing bugs decoding and encoding dates beyond 2038, which don't
 | |
| seem to be fixed as of kernel 3.12 and e2fsprogs 1.42.8. 64-bit kernels
 | |
| incorrectly use the extra epoch bits 1,1 for dates between 1901 and
 | |
| 1970. At some point the kernel will be fixed and e2fsck will fix this
 | |
| situation, assuming that it is run before 2310.
 |