2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

1068 Commits

Author SHA1 Message Date
Ondrej Mosnacek
ccf1dab96b selinux: fix handling of empty opts in selinux_fs_context_submount()
selinux_set_mnt_opts() relies on the fact that the mount options pointer
is always NULL when all options are unset (specifically in its
!selinux_initialized() branch. However, the new
selinux_fs_context_submount() hook breaks this rule by allocating a new
structure even if no options are set. That causes any submount created
before a SELinux policy is loaded to be rejected in
selinux_set_mnt_opts().

Fix this by making selinux_fs_context_submount() leave fc->security
set to NULL when there are no options to be copied from the reference
superblock.

Cc: <stable@vger.kernel.org>
Reported-by: Adam Williamson <awilliam@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2236345
Fixes: d80a8f1b58 ("vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-09-12 17:31:08 -04:00
Linus Torvalds
1086eeac9c lsm/stable-6.6 PR 20230829
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKLcUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXM/Eg//cwaOu/ASS08Cz/tfXeKpzg9UpzbW
 uHqGtgdE9ZEvS71z+3dorOJVPEwPr+/yviq3FXYjYHFqvVhLZCvYM9rw+eNo/k4T
 I95UTchGUsMWwkw61YBDLythfXm2UL5nabjckO81i9UPtxUYOwF6xQMQXYyMcLL8
 6fm1vnCvK5FBEXi2HSUWy3Eb3wdviGdHrL6h19Aeew+q8u33asWSxn9vmBSSFEzZ
 492//Pgy0t3FA6paWXQRvoR+GvLgBXNOvHB68cAx9vS8Lq6mAwJJSCRrQtKGh2Gd
 YInr49f+TXOosD5Tm6ueWO4sr8RzQZ7nPyM+BLue4Yn2ZzdYgjwfHdkHWS1KeH5X
 qVqa9s6/QONvkSCzqHs/ne2qio1Q0/0uGgwOkx6N7oVWQWjE7iTYlADwM0CDJnd2
 UD7AHTOgpc88x1T1eW599MZttSCznBTSFXv4waaS5/5NT9n8Db1TpTtCTedOc1x2
 n+c+F5BHLy69vhSGCanvum/8i2gNoKVyYaHyaMsQxr5LRcLnvN6oOjWIv7jMKxe7
 GavUAxU7M5rxPUH44vrrrI+XztKJOdpCz4S0xp+7pSSSGAK5KkmVVLXjzrlGO1WS
 55ixxQWYTGK0KlWHp4Ofi6brE9a4ATKcd1XscPN+AtBYX2ufNHLskCZulu/lyrMx
 lAy9RRDe1hHWTvg=
 =dnm4
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:

 - Add proper multi-LSM support for xattrs in the
   security_inode_init_security() hook

   Historically the LSM layer has only allowed a single LSM to add an
   xattr to an inode, with IMA/EVM measuring that and adding its own as
   well. As we work towards promoting IMA/EVM to a "proper LSM" instead
   of the special case that it is now, we need to better support the
   case of multiple LSMs each adding xattrs to an inode and after
   several attempts we now appear to have something that is working
   well. It is worth noting that in the process of making this change we
   uncovered a problem with Smack's SMACK64TRANSMUTE xattr which is also
   fixed in this pull request.

 - Additional LSM hook constification

   Two patches to constify parameters to security_capget() and
   security_binder_transfer_file(). While I generally don't make a
   special note of who submitted these patches, these were the work of
   an Outreachy intern, Khadija Kamran, and that makes me happy;
   hopefully it does the same for all of you reading this.

 - LSM hook comment header fixes

   One patch to add a missing hook comment header, one to fix a minor
   typo.

 - Remove an old, unused credential function declaration

   It wasn't clear to me who should pick this up, but it was trivial,
   obviously correct, and arguably the LSM layer has a vested interest
   in credentials so I merged it. Sadly I'm now noticing that despite my
   subject line cleanup I didn't cleanup the "unsued" misspelling, sigh

* tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: constify the 'file' parameter in security_binder_transfer_file()
  lsm: constify the 'target' parameter in security_capget()
  lsm: add comment block for security_sk_classify_flow LSM hook
  security: Fix ret values doc for security_inode_init_security()
  cred: remove unsued extern declaration change_create_files_as()
  evm: Support multiple LSMs providing an xattr
  evm: Align evm_inode_init_security() definition with LSM infrastructure
  smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()
  security: Allow all LSMs to provide xattrs for inode_init_security hook
  lsm: fix typo in security_file_lock() comment header
2023-08-30 09:07:09 -07:00
Linus Torvalds
1dbae18987 selinux/stable-6.6 PR 20230829
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmTuKKAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNTVBAAjFt/+t74FkH7OeIuwa4QFodNHWLS
 4AgtudUrH0oE/JnDDZsMNm3gjtFhM0cYcCCvItjNmea9sCMfR/huAaQXuYldLI/w
 IF2/n31Q9SlaHF8xgdpPg55tudx7khGoecC+aHA/qsRoikjCvUhOiWJZT4xn92R7
 3DtPMyoY7/1Bw8TWhq9Kj4ks2evx+o9Q798Nxu6XKUWZ6X+CasQfH0bAWWcmh8sL
 Z/mh5pyICmg9/sHvTdi3/VhhhSiWY0gyaX3tBo7Y3z6J4xjBzLzpJTC2TMaqpqLx
 nUO2R02Bk6PgMjQcVYl77WlUOfwPz3L2NkWcCmTBNzyBzirl9pUjVZ7ay9fek4Zf
 GSnoZ3IPNLxsK/oU4Zh+LKzkBpq7astovebFd4SYG+KHIhXkp8eblzxM5/M+LfG/
 SXnGrLaCwxWp4jWlYubufPtKhF8jFlpzRvTELKrhU9pZCgzpvwvIJZhUGVP6cDWh
 7j9RRJy1wBgqoDASM5+tCrplkwFa3r4AV5CVOFYSXgyvLCHk28IxcMEVqCvoO3yQ
 FmvAi0MV85XOa/NpD3hYpnvfAcLbv6ftZC6JekqXMjwVE9vfdrywS4ZW4Z3BcEo+
 M+eDsCh8ek9LmO+6IW0jffNuHijLKDr3hywnndNWascrmkmgDmu60kW/HIQORkPa
 tZkswaKUa6RszR0=
 =XB/b
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "Thirty three SELinux patches, which is a pretty big number for us, but
  there isn't really anything scary in here; in fact we actually manage
  to remove 10 lines of code with this :)

   - Promote the SELinux DEBUG_HASHES macro to CONFIG_SECURITY_SELINUX_DEBUG

     The DEBUG_HASHES macro was a buried SELinux specific preprocessor
     debug macro that was a problem waiting to happen. Promoting the
     debug macro to a proper Kconfig setting should help both improve
     the visibility of the feature as well enable improved test
     coverage. We've moved some additional debug functions under the
     CONFIG_SECURITY_SELINUX_DEBUG flag and we may see more work in the
     future.

   - Emit a pr_notice() message if virtual memory is executable by default

     As this impacts the SELinux access control policy enforcement, if
     the system's configuration is such that virtual memory is
     executable by default we print a single line notice to the console.

   - Drop avtab_search() in favor of avtab_search_node()

     Both functions are nearly identical so we removed avtab_search()
     and converted the callers to avtab_search_node().

   - Add some SELinux network auditing helpers

     The helpers not only reduce a small amount of code duplication, but
     they provide an opportunity to improve UDP flood performance
     slightly by delaying initialization of the audit data in some
     cases.

   - Convert GFP_ATOMIC allocators to GFP_KERNEL when reading SELinux policy

     There were two SELinux policy load helper functions that were
     allocating memory using GFP_ATOMIC, they have been converted to
     GFP_KERNEL.

   - Quiet a KMSAN warning in selinux_inet_conn_request()

     A one-line error path (re)set patch that resolves a KMSAN warning.
     It is important to note that this doesn't represent a real bug in
     the current code, but it quiets KMSAN and arguably hardens the code
     against future changes.

   - Cleanup the policy capability accessor functions

     This is a follow-up to the patch which reverted SELinux to using a
     global selinux_state pointer. This patch cleans up some artifacts
     of that change and turns each accessor into a one-line READ_ONCE()
     call into the policy capabilities array.

   - A number of patches from Christian Göttsche

     Christian submitted almost two-thirds of the patches in this pull
     request as he worked to harden the SELinux code against type
     differences, variable overflows, etc.

   - Support for separating early userspace from the kernel in policy,
     with a later revert

     We did have a patch that added a new userspace initial SID which
     would allow SELinux to distinguish between early user processes
     created before the initial policy load and the kernel itself.

     Unfortunately additional post-merge testing revealed a problematic
     interaction with an old SELinux userspace on an old version of
     Ubuntu so we've reverted the patch until we can resolve the
     compatibility issue.

   - Remove some outdated comments dealing with LSM hook registration

     When we removed the runtime disable functionality we forgot to
     remove some old comments discussing the importance of LSM hook
     registration ordering.

   - Minor administrative changes

     Stephen Smalley updated his email address and "debranded" SELinux
     from "NSA SELinux" to simply "SELinux". We've come a long way from
     the original NSA submission and I would consider SELinux a true
     community project at this point so removing the NSA branding just
     makes sense"

* tag 'selinux-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (33 commits)
  selinux: prevent KMSAN warning in selinux_inet_conn_request()
  selinux: use unsigned iterator in nlmsgtab code
  selinux: avoid implicit conversions in policydb code
  selinux: avoid implicit conversions in selinuxfs code
  selinux: make left shifts well defined
  selinux: update type for number of class permissions in services code
  selinux: avoid implicit conversions in avtab code
  selinux: revert SECINITSID_INIT support
  selinux: use GFP_KERNEL while reading binary policy
  selinux: update comment on selinux_hooks[]
  selinux: avoid implicit conversions in services code
  selinux: avoid implicit conversions in mls code
  selinux: use identical iterator type in hashtab_duplicate()
  selinux: move debug functions into debug configuration
  selinux: log about VM being executable by default
  selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()
  selinux: introduce SECURITY_SELINUX_DEBUG configuration
  selinux: introduce and use lsm_ad_net_init*() helpers
  selinux: update my email address
  selinux: add missing newlines in pr_err() statements
  ...
2023-08-30 08:51:16 -07:00
Linus Torvalds
b96a3e9142 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP.  It
   also speeds up GUP handling of transparent hugepages.
 
 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").
 
 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").
 
 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").
 
 - xu xin has doe some work on KSM's handling of zero pages.  These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support tracking
   KSM-placed zero-pages").
 
 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
 
 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").
 
 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD").
 
 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").
 
 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").
 
 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").
 
 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").
 
 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").
 
 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap").  And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").
 
 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
 
 - Baoquan He has converted some architectures to use the GENERIC_IOREMAP
   ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take
   GENERIC_IOREMAP way").
 
 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").
 
 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep").  Liam also developed some efficiency improvements
   ("Reduce preallocations for maple tree").
 
 - Cleanup and optimization to the secondary IOMMU TLB invalidation, from
   Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").
 
 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").
 
 - Kemeng Shi provides some maintenance work on the compaction code ("Two
   minor cleanups for compaction").
 
 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most
   file-backed faults under the VMA lock").
 
 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").
 
 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").
 
 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").
 
 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
 
 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").
 
 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").
 
 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
 
 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").
 
 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").
 
 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap
   on memory feature on ppc64").
 
 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock migratetype").
 
 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").
 
 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").
 
 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").
 
 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").
 
 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").
 
 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").
 
 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").
 
 - pagetable handling cleanups from Matthew Wilcox ("New page table range
   API").
 
 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").
 
 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").
 
 - Matthew Wilcox has also done some maintenance work on the MM subsystem
   documentation ("Improve mm documentation").
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZO1JUQAKCRDdBJ7gKXxA
 jrMwAP47r/fS8vAVT3zp/7fXmxaJYTK27CTAM881Gw1SDhFM/wEAv8o84mDenCg6
 Nfio7afS1ncD+hPYT8947UnLxTgn+ww=
 =Afws
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Some swap cleanups from Ma Wupeng ("fix WARN_ON in
   add_to_avail_list")

 - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
   reduces the special-case code for handling hugetlb pages in GUP. It
   also speeds up GUP handling of transparent hugepages.

 - Peng Zhang provides some maple tree speedups ("Optimize the fast path
   of mas_store()").

 - Sergey Senozhatsky has improved te performance of zsmalloc during
   compaction (zsmalloc: small compaction improvements").

 - Domenico Cerasuolo has developed additional selftest code for zswap
   ("selftests: cgroup: add zswap test program").

 - xu xin has doe some work on KSM's handling of zero pages. These
   changes are mainly to enable the user to better understand the
   effectiveness of KSM's treatment of zero pages ("ksm: support
   tracking KSM-placed zero-pages").

 - Jeff Xu has fixes the behaviour of memfd's
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
   MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").

 - David Howells has fixed an fscache optimization ("mm, netfs, fscache:
   Stop read optimisation when folio removed from pagecache").

 - Axel Rasmussen has given userfaultfd the ability to simulate memory
   poisoning ("add UFFDIO_POISON to simulate memory poisoning with
   UFFD").

 - Miaohe Lin has contributed some routine maintenance work on the
   memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
   check").

 - Peng Zhang has contributed some maintenance work on the maple tree
   code ("Improve the validation for maple tree and some cleanup").

 - Hugh Dickins has optimized the collapsing of shmem or file pages into
   THPs ("mm: free retracted page table by RCU").

 - Jiaqi Yan has a patch series which permits us to use the healthy
   subpages within a hardware poisoned huge page for general purposes
   ("Improve hugetlbfs read on HWPOISON hugepages").

 - Kemeng Shi has done some maintenance work on the pagetable-check code
   ("Remove unused parameters in page_table_check").

 - More folioification work from Matthew Wilcox ("More filesystem folio
   conversions for 6.6"), ("Followup folio conversions for zswap"). And
   from ZhangPeng ("Convert several functions in page_io.c to use a
   folio").

 - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").

 - Baoquan He has converted some architectures to use the
   GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
   architectures to take GENERIC_IOREMAP way").

 - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
   batched/deferred tlb shootdown during page reclamation/migration").

 - Better maple tree lockdep checking from Liam Howlett ("More strict
   maple tree lockdep"). Liam also developed some efficiency
   improvements ("Reduce preallocations for maple tree").

 - Cleanup and optimization to the secondary IOMMU TLB invalidation,
   from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
   upgrade").

 - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
   for arm64").

 - Kemeng Shi provides some maintenance work on the compaction code
   ("Two minor cleanups for compaction").

 - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
   most file-backed faults under the VMA lock").

 - Aneesh Kumar contributes code to use the vmemmap optimization for DAX
   on ppc64, under some circumstances ("Add support for DAX vmemmap
   optimization for ppc64").

 - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
   data in page_ext"), ("minor cleanups to page_ext header").

 - Some zswap cleanups from Johannes Weiner ("mm: zswap: three
   cleanups").

 - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").

 - VMA handling cleanups from Kefeng Wang ("mm: convert to
   vma_is_initial_heap/stack()").

 - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
   implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
   address ranges and DAMON monitoring targets").

 - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").

 - Liam Howlett has improved the maple tree node replacement code
   ("maple_tree: Change replacement strategy").

 - ZhangPeng has a general code cleanup - use the K() macro more widely
   ("cleanup with helper macro K()").

 - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
   memmap on memory feature on ppc64").

 - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
   in page_alloc"), ("Two minor cleanups for get pageblock
   migratetype").

 - Vishal Moola introduces a memory descriptor for page table tracking,
   "struct ptdesc" ("Split ptdesc from struct page").

 - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
   for vm.memfd_noexec").

 - MM include file rationalization from Hugh Dickins ("arch: include
   asm/cacheflush.h in asm/hugetlb.h").

 - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
   output").

 - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
   object_cache instead of kmemleak_initialized").

 - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
   and _folio_order").

 - A VMA locking scalability improvement from Suren Baghdasaryan
   ("Per-VMA lock support for swap and userfaults").

 - pagetable handling cleanups from Matthew Wilcox ("New page table
   range API").

 - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
   using page->private on tail pages for THP_SWAP + cleanups").

 - Cleanups and speedups to the hugetlb fault handling from Matthew
   Wilcox ("Change calling convention for ->huge_fault").

 - Matthew Wilcox has also done some maintenance work on the MM
   subsystem documentation ("Improve mm documentation").

* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
  maple_tree: shrink struct maple_tree
  maple_tree: clean up mas_wr_append()
  secretmem: convert page_is_secretmem() to folio_is_secretmem()
  nios2: fix flush_dcache_page() for usage from irq context
  hugetlb: add documentation for vma_kernel_pagesize()
  mm: add orphaned kernel-doc to the rst files.
  mm: fix clean_record_shared_mapping_range kernel-doc
  mm: fix get_mctgt_type() kernel-doc
  mm: fix kernel-doc warning from tlb_flush_rmaps()
  mm: remove enum page_entry_size
  mm: allow ->huge_fault() to be called without the mmap_lock held
  mm: move PMD_ORDER to pgtable.h
  mm: remove checks for pte_index
  memcg: remove duplication detection for mem_cgroup_uncharge_swap
  mm/huge_memory: work on folio->swap instead of page->private when splitting folio
  mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
  mm/swap: use dedicated entry for swap in folio
  mm/swap: stop using page->private on tail pages for THP_SWAP
  selftests/mm: fix WARNING comparing pointer to 0
  selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
  ...
2023-08-29 14:25:26 -07:00
Linus Torvalds
bd6c11bc43 Networking changes for 6.6.
Core
 ----
 
  - Increase size limits for to-be-sent skb frag allocations. This
    allows tun, tap devices and packet sockets to better cope with large
    writes operations.
 
  - Store netdevs in an xarray, to simplify iterating over netdevs.
 
  - Refactor nexthop selection for multipath routes.
 
  - Improve sched class lifetime handling.
 
  - Add backup nexthop ID support for bridge.
 
  - Implement drop reasons support in openvswitch.
 
  - Several data races annotations and fixes.
 
  - Constify the sk parameter of routing functions.
 
  - Prepend kernel version to netconsole message.
 
 Protocols
 ---------
 
  - Implement support for TCP probing the peer being under memory
    pressure.
 
  - Remove hard coded limitation on IPv6 specific info placement
    inside the socket struct.
 
  - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated
    per socket scaling factor.
 
  - Scaling-up the IPv6 expired route GC via a separated list of
    expiring routes.
 
  - In-kernel support for the TLS alert protocol.
 
  - Better support for UDP reuseport with connected sockets.
 
  - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
    header size.
 
  - Get rid of additional ancillary per MPTCP connection struct socket.
 
  - Implement support for BPF-based MPTCP packet schedulers.
 
  - Format MPTCP subtests selftests results in TAP.
 
  - Several new SMC 2.1 features including unique experimental options,
    max connections per lgr negotiation, max links per lgr negotiation.
 
 BPF
 ---
 
  - Multi-buffer support in AF_XDP.
 
  - Add multi uprobe BPF links for attaching multiple uprobes
    and usdt probes, which is significantly faster and saves extra fds.
 
  - Implement an fd-based tc BPF attach API (TCX) and BPF link support on
    top of it.
 
  - Add SO_REUSEPORT support for TC bpf_sk_assign.
 
  - Support new instructions from cpu v4 to simplify the generated code and
    feature completeness, for x86, arm64, riscv64.
 
  - Support defragmenting IPv(4|6) packets in BPF.
 
  - Teach verifier actual bounds of bpf_get_smp_processor_id()
    and fix perf+libbpf issue related to custom section handling.
 
  - Introduce bpf map element count and enable it for all program types.
 
  - Add a BPF hook in sys_socket() to change the protocol ID
    from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy.
 
  - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress.
 
  - Add uprobe support for the bpf_get_func_ip helper.
 
  - Check skb ownership against full socket.
 
  - Support for up to 12 arguments in BPF trampoline.
 
  - Extend link_info for kprobe_multi and perf_event links.
 
 Netfilter
 ---------
 
  - Speed-up process exit by aborting ruleset validation if a
    fatal signal is pending.
 
  - Allow NLA_POLICY_MASK to be used with BE16/BE32 types.
 
 Driver API
 ----------
 
  - Page pool optimizations, to improve data locality and cache usage.
 
  - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need
    for raw ioctl() handling in drivers.
 
  - Simplify genetlink dump operations (doit/dumpit) providing them
    the common information already populated in struct genl_info.
 
  - Extend and use the yaml devlink specs to [re]generate the split ops.
 
  - Introduce devlink selective dumps, to allow SF filtering SF based on
    handle and other attributes.
 
  - Add yaml netlink spec for netlink-raw families, allow route, link and
    address related queries via the ynl tool.
 
  - Remove phylink legacy mode support.
 
  - Support offload LED blinking to phy.
 
  - Add devlink port function attributes for IPsec.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Broadcom ASP 2.0 (72165) ethernet controller
    - MediaTek MT7988 SoC
    - Texas Instruments AM654 SoC
    - Texas Instruments IEP driver
    - Atheros qca8081 phy
    - Marvell 88Q2110 phy
    - NXP TJA1120 phy
 
  - WiFi:
    - MediaTek mt7981 support
 
  - Can:
    - Kvaser SmartFusion2 PCI Express devices
    - Allwinner T113 controllers
    - Texas Instruments tcan4552/4553 chips
 
  - Bluetooth:
    - Intel Gale Peak
    - Qualcomm WCN3988 and WCN7850
    - NXP AW693 and IW624
    - Mediatek MT2925
 
 Drivers
 -------
 
  - Ethernet NICs:
    - nVidia/Mellanox:
      - mlx5:
        - support UDP encapsulation in packet offload mode
        - IPsec packet offload support in eswitch mode
        - improve aRFS observability by adding new set of counters
        - extends MACsec offload support to cover RoCE traffic
        - dynamic completion EQs
      - mlx4:
        - convert to use auxiliary bus instead of custom interface logic
    - Intel
      - ice:
        - implement switchdev bridge offload, even for LAG interfaces
        - implement SRIOV support for LAG interfaces
      - igc:
        - add support for multiple in-flight TX timestamps
    - Broadcom:
      - bnxt:
        - use the unified RX page pool buffers for XDP and non-XDP
        - use the NAPI skb allocation cache
    - OcteonTX2:
      - support Round Robin scheduling HTB offload
      - TC flower offload support for SPI field
    - Freescale:
      -  add XDP_TX feature support
    - AMD:
      - ionic: add support for PCI FLR event
      - sfc:
        - basic conntrack offload
        - introduce eth, ipv4 and ipv6 pedit offloads
    - ST Microelectronics:
      - stmmac: maximze PTP timestamping resolution
 
  - Virtual NICs:
    - Microsoft vNIC:
      - batch ringing RX queue doorbell on receiving packets
      - add page pool for RX buffers
    - Virtio vNIC:
      - add per queue interrupt coalescing support
    - Google vNIC:
      - add queue-page-list mode support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add port range matching tc-flower offload
      - permit enslavement to netdevices with uppers
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - convert to phylink_pcs
    - Renesas:
      - r8A779fx: add speed change support
      - rzn1: enables vlan support
 
  - Ethernet PHYs:
    - convert mv88e6xxx to phylink_pcs
 
  - WiFi:
    - Qualcomm Wi-Fi 7 (ath12k):
      - extremely High Throughput (EHT) PHY support
    - RealTek (rtl8xxxu):
      - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
        RTL8192EU and RTL8723BU
    - RealTek (rtw89):
      - Introduce Time Averaged SAR (TAS) support
 
  - Connector:
    - support for event filtering
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmTt1ZoSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkgFUP/REFaYWdWUvAzmWeezyx9dqgZMfSOjWq
 9QvySiA94OAOcjIYkb7wfzQ5BBAZqaBQ/f8XqWwS1EDDDEBs8sP1cxmABKwW7Hsr
 qFRu2sOqLzKBk223d0jIgEocfQaFpGbF71gXoTlDivBjBi5UxWm9bF0XnbYWcKgO
 /QEvzNosi9uNdi85Fzmv62J6YzAdidEpwGsM7X2CfejwNRmStxAEg/NwvRR0Hyiq
 OJCo97omEgTRaUle8nc64PDx33u4h5kQ1BkaeHEv0rbE3hftFC2YPKn/InmqSFGz
 6ew2xnrGPR37LCuAiCcIIv6yR7K0eu0iYJ7jXwZxBDqxGavEPuwWGBoCP6qFiitH
 ZLWhIrAUrdmSbySkTOCONhJ475qFAuQoYHYpZnX/bJZUHlSsb/9lwDJYJQGpVfd1
 /daqJVSb7lhaifmNO1iNd/ibCIXq9zapwtkRwA897M8GkZBTsnVvazFld1Em+Se3
 Bx6DSDUVBqVQ9fpZG2IAGD6odDwOzC1lF2IoceFvK9Ff6oE0psI+A0qNLMkHxZbW
 Qlo7LsNe53hpoCC+yHTfXX7e/X8eNt0EnCGOQJDusZ0Nr3K7H4LKFA0i8UBUK05n
 4lKnnaSQW7GQgdofLWt103OMDR9GoDxpFsm7b1X9+AEk6Fz6tq50wWYeMZETUKYP
 DCW8VGFOZjZM
 =9CsR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "Core:

   - Increase size limits for to-be-sent skb frag allocations. This
     allows tun, tap devices and packet sockets to better cope with
     large writes operations

   - Store netdevs in an xarray, to simplify iterating over netdevs

   - Refactor nexthop selection for multipath routes

   - Improve sched class lifetime handling

   - Add backup nexthop ID support for bridge

   - Implement drop reasons support in openvswitch

   - Several data races annotations and fixes

   - Constify the sk parameter of routing functions

   - Prepend kernel version to netconsole message

  Protocols:

   - Implement support for TCP probing the peer being under memory
     pressure

   - Remove hard coded limitation on IPv6 specific info placement inside
     the socket struct

   - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
     socket scaling factor

   - Scaling-up the IPv6 expired route GC via a separated list of
     expiring routes

   - In-kernel support for the TLS alert protocol

   - Better support for UDP reuseport with connected sockets

   - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
     header size

   - Get rid of additional ancillary per MPTCP connection struct socket

   - Implement support for BPF-based MPTCP packet schedulers

   - Format MPTCP subtests selftests results in TAP

   - Several new SMC 2.1 features including unique experimental options,
     max connections per lgr negotiation, max links per lgr negotiation

  BPF:

   - Multi-buffer support in AF_XDP

   - Add multi uprobe BPF links for attaching multiple uprobes and usdt
     probes, which is significantly faster and saves extra fds

   - Implement an fd-based tc BPF attach API (TCX) and BPF link support
     on top of it

   - Add SO_REUSEPORT support for TC bpf_sk_assign

   - Support new instructions from cpu v4 to simplify the generated code
     and feature completeness, for x86, arm64, riscv64

   - Support defragmenting IPv(4|6) packets in BPF

   - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
     perf+libbpf issue related to custom section handling

   - Introduce bpf map element count and enable it for all program types

   - Add a BPF hook in sys_socket() to change the protocol ID from
     IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy

   - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress

   - Add uprobe support for the bpf_get_func_ip helper

   - Check skb ownership against full socket

   - Support for up to 12 arguments in BPF trampoline

   - Extend link_info for kprobe_multi and perf_event links

  Netfilter:

   - Speed-up process exit by aborting ruleset validation if a fatal
     signal is pending

   - Allow NLA_POLICY_MASK to be used with BE16/BE32 types

  Driver API:

   - Page pool optimizations, to improve data locality and cache usage

   - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
     need for raw ioctl() handling in drivers

   - Simplify genetlink dump operations (doit/dumpit) providing them the
     common information already populated in struct genl_info

   - Extend and use the yaml devlink specs to [re]generate the split ops

   - Introduce devlink selective dumps, to allow SF filtering SF based
     on handle and other attributes

   - Add yaml netlink spec for netlink-raw families, allow route, link
     and address related queries via the ynl tool

   - Remove phylink legacy mode support

   - Support offload LED blinking to phy

   - Add devlink port function attributes for IPsec

  New hardware / drivers:

   - Ethernet:
      - Broadcom ASP 2.0 (72165) ethernet controller
      - MediaTek MT7988 SoC
      - Texas Instruments AM654 SoC
      - Texas Instruments IEP driver
      - Atheros qca8081 phy
      - Marvell 88Q2110 phy
      - NXP TJA1120 phy

   - WiFi:
      - MediaTek mt7981 support

   - Can:
      - Kvaser SmartFusion2 PCI Express devices
      - Allwinner T113 controllers
      - Texas Instruments tcan4552/4553 chips

   - Bluetooth:
      - Intel Gale Peak
      - Qualcomm WCN3988 and WCN7850
      - NXP AW693 and IW624
      - Mediatek MT2925

  Drivers:

   - Ethernet NICs:
      - nVidia/Mellanox:
         - mlx5:
            - support UDP encapsulation in packet offload mode
            - IPsec packet offload support in eswitch mode
            - improve aRFS observability by adding new set of counters
            - extends MACsec offload support to cover RoCE traffic
            - dynamic completion EQs
         - mlx4:
            - convert to use auxiliary bus instead of custom interface
              logic
      - Intel
         - ice:
            - implement switchdev bridge offload, even for LAG
              interfaces
            - implement SRIOV support for LAG interfaces
         - igc:
            - add support for multiple in-flight TX timestamps
      - Broadcom:
         - bnxt:
            - use the unified RX page pool buffers for XDP and non-XDP
            - use the NAPI skb allocation cache
      - OcteonTX2:
         - support Round Robin scheduling HTB offload
         - TC flower offload support for SPI field
      - Freescale:
         - add XDP_TX feature support
      - AMD:
         - ionic: add support for PCI FLR event
         - sfc:
            - basic conntrack offload
            - introduce eth, ipv4 and ipv6 pedit offloads
      - ST Microelectronics:
         - stmmac: maximze PTP timestamping resolution

   - Virtual NICs:
      - Microsoft vNIC:
         - batch ringing RX queue doorbell on receiving packets
         - add page pool for RX buffers
      - Virtio vNIC:
         - add per queue interrupt coalescing support
      - Google vNIC:
         - add queue-page-list mode support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add port range matching tc-flower offload
         - permit enslavement to netdevices with uppers

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - convert to phylink_pcs
      - Renesas:
         - r8A779fx: add speed change support
         - rzn1: enables vlan support

   - Ethernet PHYs:
      - convert mv88e6xxx to phylink_pcs

   - WiFi:
      - Qualcomm Wi-Fi 7 (ath12k):
         - extremely High Throughput (EHT) PHY support
      - RealTek (rtl8xxxu):
         - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
           RTL8192EU and RTL8723BU
      - RealTek (rtw89):
         - Introduce Time Averaged SAR (TAS) support

   - Connector:
      - support for event filtering"

* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
  net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
  net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
  net: stmmac: clarify difference between "interface" and "phy_interface"
  r8152: add vendor/device ID pair for D-Link DUB-E250
  devlink: move devlink_notify_register/unregister() to dev.c
  devlink: move small_ops definition into netlink.c
  devlink: move tracepoint definitions into core.c
  devlink: push linecard related code into separate file
  devlink: push rate related code into separate file
  devlink: push trap related code into separate file
  devlink: use tracepoint_enabled() helper
  devlink: push region related code into separate file
  devlink: push param related code into separate file
  devlink: push resource related code into separate file
  devlink: push dpipe related code into separate file
  devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
  devlink: push shared buffer related code into separate file
  devlink: push port related code into separate file
  devlink: push object register/unregister notifications into separate helpers
  inet: fix IP_TRANSPARENT error handling
  ...
2023-08-29 11:33:01 -07:00
Kefeng Wang
68df1baf15 selinux: use vma_is_initial_stack() and vma_is_initial_heap()
Use the helpers to simplify code.

Link: https://lkml.kernel.org/r/20230728050043.59880-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Christian Göttsche <cgzones@googlemail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21 13:37:31 -07:00
Khadija Kamran
8e4672d6f9 lsm: constify the 'file' parameter in security_binder_transfer_file()
SELinux registers the implementation for the "binder_transfer_file"
hook. Looking at the function implementation we observe that the
parameter "file" is not changing.

Mark the "file" parameter of LSM hook security_binder_transfer_file() as
"const" since it will not be changing in the LSM hook.

Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
[PM: subject line whitespace fix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-15 16:04:34 -04:00
David Howells
d80a8f1b58 vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
When NFS superblocks are created by automounting, their LSM parameters
aren't set in the fs_context struct prior to sget_fc() being called,
leading to failure to match existing superblocks.

This bug leads to messages like the following appearing in dmesg when
fscache is enabled:

    NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)

Fix this by adding a new LSM hook to load fc->security for submount
creation.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
Fixes: 9bc61ab18b ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
Fixes: 779df6a548 ("NFS: Ensure security label is set for root inode")
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230808-master-v9-1-e0ecde888221@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-15 08:32:30 +02:00
Paul Moore
817199e006 selinux: revert SECINITSID_INIT support
This commit reverts 5b0eea835d ("selinux: introduce an initial SID
for early boot processes") as it was found to cause problems on
distros with old SELinux userspace tools/libraries, specifically
Ubuntu 16.04.

Hopefully we will be able to re-add this functionality at a later
date, but let's revert this for now to help ensure a stable and
backwards compatible SELinux tree.

Link: https://lore.kernel.org/selinux/87edkseqf8.fsf@mail.lhotse
Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09 10:51:13 -04:00
Khadija Kamran
6672efbb68 lsm: constify the 'target' parameter in security_capget()
Three LSMs register the implementations for the "capget" hook: AppArmor,
SELinux, and the normal capability code. Looking at the function
implementations we may observe that the first parameter "target" is not
changing.

Mark the first argument "target" of LSM hook security_capget() as
"const" since it will not be changing in the LSM hook.

cap_capget() LSM hook declaration exceeds the 80 characters per line
limit. Split the function declaration to multiple lines to decrease the
line length.

Signed-off-by: Khadija Kamran <kamrankhadijadj@gmail.com>
Acked-by: John Johansen <john.johansen@canonical.com>
[PM: align the cap_capget() declaration, spelling fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-08 16:48:47 -04:00
Xiu Jianfeng
64f18f8a8c selinux: update comment on selinux_hooks[]
After commit f22f9aaf6c ("selinux: remove the runtime disable
functionality"), the comment on selinux_hooks[] is out-of-date,
remove the last paragraph about runtime disable functionality.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-08 13:28:42 -04:00
Christian Göttsche
19c5b015d1 selinux: log about VM being executable by default
In case virtual memory is being marked as executable by default, SELinux
checks regarding explicit potential dangerous use are disabled.

Inform the user about it.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-28 14:04:14 -04:00
Paul Moore
3876043ad9 selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()
Use a NULL instead of a zero to resolve a int/pointer mismatch.

Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307210332.4AqFZfzI-lkp@intel.com/
Fixes: dd51fcd42f ("selinux: introduce and use lsm_ad_net_init*() helpers")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-20 16:29:47 -04:00
Paolo Abeni
dd51fcd42f selinux: introduce and use lsm_ad_net_init*() helpers
Perf traces of network-related workload shows a measurable overhead
inside the network-related selinux hooks while zeroing the
lsm_network_audit struct.

In most cases we can delay the initialization of such structure to the
usage point, avoiding such overhead in a few cases.

Additionally, the audit code accesses the IP address information only
for AF_INET* families, and selinux_parse_skb() will fill-out the
relevant fields in such cases. When the family field is zeroed or the
initialization is followed by the mentioned parsing, the zeroing can be
limited to the sk, family and netif fields.

By factoring out the audit-data initialization to new helpers, this
patch removes some duplicate code and gives small but measurable
performance gain under UDP flood.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19 16:10:05 -04:00
Stephen Smalley
0fe53224bf selinux: update my email address
Update my email address; MAINTAINERS was updated some time ago.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19 11:27:02 -04:00
Christian Göttsche
e5faa839c3 selinux: add missing newlines in pr_err() statements
The kernel print statements do not append an implicit newline to format
strings.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19 11:12:48 -04:00
Stephen Smalley
90aa4f5e92 selinux: de-brand SELinux
Change "NSA SELinux" to just "SELinux" in Kconfig help text and
comments. While NSA was the original primary developer and continues to
help maintain SELinux, SELinux has long since transitioned to a wide
community of developers and maintainers. SELinux has been part of the
mainline Linux kernel for nearly 20 years now [1] and has received
contributions from many individuals and organizations.

[1] https://lore.kernel.org/lkml/Pine.LNX.4.44.0308082228470.1852-100000@home.osdl.org/

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18 18:42:57 -04:00
Christian Göttsche
a13479bb3c selinux: avoid implicit conversions in the LSM hooks
Use the identical types in assignments of local variables for the
destination.

Merge tail calls into return statements.

Avoid using leading underscores for function local variable.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18 18:29:48 -04:00
Guillaume Nault
5b52ad34f9 security: Constify sk in the sk_getsecid hook.
The sk_getsecid hook shouldn't need to modify its socket argument.
Make it const so that callers of security_sk_classify_flow() can use a
const struct sock *.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-14 08:27:33 +01:00
Ondrej Mosnacek
5b0eea835d selinux: introduce an initial SID for early boot processes
Currently, SELinux doesn't allow distinguishing between kernel threads
and userspace processes that are started before the policy is first
loaded - both get the label corresponding to the kernel SID. The only
way a process that persists from early boot can get a meaningful label
is by doing a voluntary dyntransition or re-executing itself.

Reusing the kernel label for userspace processes is problematic for
several reasons:
1. The kernel is considered to be a privileged domain and generally
   needs to have a wide range of permissions allowed to work correctly,
   which prevents the policy writer from effectively hardening against
   early boot processes that might remain running unintentionally after
   the policy is loaded (they represent a potential extra attack surface
   that should be mitigated).
2. Despite the kernel being treated as a privileged domain, the policy
   writer may want to impose certain special limitations on kernel
   threads that may conflict with the requirements of intentional early
   boot processes. For example, it is a good hardening practice to limit
   what executables the kernel can execute as usermode helpers and to
   confine the resulting usermode helper processes. However, a
   (legitimate) process surviving from early boot may need to execute a
   different set of executables.
3. As currently implemented, overlayfs remembers the security context of
   the process that created an overlayfs mount and uses it to bound
   subsequent operations on files using this context. If an overlayfs
   mount is created before the SELinux policy is loaded, these "mounter"
   checks are made against the kernel context, which may clash with
   restrictions on the kernel domain (see 2.).

To resolve this, introduce a new initial SID (reusing the slot of the
former "init" initial SID) that will be assigned to any userspace
process started before the policy is first loaded. This is easy to do,
as we can simply label any process that goes through the
bprm_creds_for_exec LSM hook with the new init-SID instead of
propagating the kernel SID from the parent.

To provide backwards compatibility for existing policies that are
unaware of this new semantic of the "init" initial SID, introduce a new
policy capability "userspace_initial_context" and set the "init" SID to
the same context as the "kernel" SID unless this capability is set by
the policy.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10 14:23:56 -04:00
Roberto Sassu
6bcdfd2cac security: Allow all LSMs to provide xattrs for inode_init_security hook
Currently, the LSM infrastructure supports only one LSM providing an xattr
and EVM calculating the HMAC on that xattr, plus other inode metadata.

Allow all LSMs to provide one or multiple xattrs, by extending the security
blob reservation mechanism. Introduce the new lbs_xattr_count field of the
lsm_blob_sizes structure, so that each LSM can specify how many xattrs it
needs, and the LSM infrastructure knows how many xattr slots it should
allocate.

Modify the inode_init_security hook definition, by passing the full
xattr array allocated in security_inode_init_security(), and the current
number of xattr slots in that array filled by LSMs. The first parameter
would allow EVM to access and calculate the HMAC on xattrs supplied by
other LSMs, the second to not leave gaps in the xattr array, when an LSM
requested but did not provide xattrs (e.g. if it is not initialized).

Introduce lsm_get_xattr_slot(), which LSMs can call as many times as the
number specified in the lbs_xattr_count field of the lsm_blob_sizes
structure. During each call, lsm_get_xattr_slot() increments the number of
filled xattrs, so that at the next invocation it returns the next xattr
slot to fill.

Cleanup security_inode_init_security(). Unify the !initxattrs and
initxattrs case by simply not allocating the new_xattrs array in the
former. Update the documentation to reflect the changes, and fix the
description of the xattr name, as it is not allocated anymore.

Adapt both SELinux and Smack to use the new definition of the
inode_init_security hook, and to call lsm_get_xattr_slot() to obtain and
fill the reserved slots in the xattr array.

Move the xattr->name assignment after the xattr->value one, so that it is
done only in case of successful memory allocation.

Finally, change the default return value of the inode_init_security hook
from zero to -EOPNOTSUPP, so that BPF LSM correctly follows the hook
conventions.

Reported-by: Nicolas Bouchinet <nicolas.bouchinet@clip-os.org>
Link: https://lore.kernel.org/linux-integrity/Y1FTSIo+1x+4X0LS@archlinux/
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
[PM: minor comment and variable tweaks, approved by RS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10 13:59:37 -04:00
Ondrej Mosnacek
cec5fe7007 selinux: make labeled NFS work when mounted before policy load
Currently, when an NFS filesystem that supports passing LSM/SELinux
labels is mounted during early boot (before the SELinux policy is
loaded), it ends up mounted without the labeling support (i.e. with
Fedora policy all files get the generic NFS label
system_u:object_r:nfs_t:s0).

This is because the information that the NFS mount supports passing
labels (communicated to the LSM layer via the kern_flags argument of
security_set_mnt_opts()) gets lost and when the policy is loaded the
mount is initialized as if the passing is not supported.

Fix this by noting the "native labeling" in newsbsec->flags (using a new
SE_SBNATIVE flag) on the pre-policy-loaded call of
selinux_set_mnt_opts() and then making sure it is respected on the
second call from delayed_superblock_init().

Additionally, make inode_doinit_with_dentry() initialize the inode's
label from its extended attributes whenever it doesn't find it already
intitialized by the filesystem. This is needed to properly initialize
pre-existing inodes when delayed_superblock_init() is called. It should
not trigger in any other cases (and if it does, it's still better to
initialize the correct label instead of leaving the inode unlabeled).

Fixes: eb9ae68650 ("SELinux: Add new labeling type native labels")
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
[PM: fixed 'Fixes' tag format]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-30 17:44:34 -04:00
Paolo Abeni
85c3222ddd selinux: Implement mptcp_add_subflow hook
Newly added subflows should inherit the LSM label from the associated
MPTCP socket regardless of the current context.

This patch implements the above copying sid and class from the MPTCP
socket context, deleting the existing subflow label, if any, and then
re-creating the correct one.

The new helper reuses the selinux_netlbl_sk_security_free() function,
and the latter can end-up being called multiple times with the same
argument; we additionally need to make it idempotent.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-18 13:11:10 -04:00
Christian Göttsche
4158cb6000 selinux: declare read-only data arrays const
The array of mount tokens in only used in match_opt_prefix() and never
modified.

The array of symtab names is never modified and only used in the
DEBUG_HASHES configuration as output.

The array of files for the SElinux filesystem sub-directory `ss` is
similar to the other `struct tree_descr` usages only read from to
construct the containing entries.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08 16:52:05 -04:00
Christian Göttsche
3d9047a064 selinux: adjust typos in comments
Found by codespell(1)

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08 16:44:01 -04:00
Paul Moore
f22f9aaf6c selinux: remove the runtime disable functionality
After working with the larger SELinux-based distros for several
years, we're finally at a place where we can disable the SELinux
runtime disable functionality.  The existing kernel deprecation
notice explains the functionality and why we want to remove it:

  The selinuxfs "disable" node allows SELinux to be disabled at
  runtime prior to a policy being loaded into the kernel.  If
  disabled via this mechanism, SELinux will remain disabled until
  the system is rebooted.

  The preferred method of disabling SELinux is via the "selinux=0"
  boot parameter, but the selinuxfs "disable" node was created to
  make it easier for systems with primitive bootloaders that did not
  allow for easy modification of the kernel command line.
  Unfortunately, allowing for SELinux to be disabled at runtime makes
  it difficult to secure the kernel's LSM hooks using the
  "__ro_after_init" feature.

It is that last sentence, mentioning the '__ro_after_init' hardening,
which is the real motivation for this change, and if you look at the
diffstat you'll see that the impact of this patch reaches across all
the different LSMs, helping prevent tampering at the LSM hook level.

From a SELinux perspective, it is important to note that if you
continue to disable SELinux via "/etc/selinux/config" it may appear
that SELinux is disabled, but it is simply in an uninitialized state.
If you load a policy with `load_policy -i`, you will see SELinux
come alive just as if you had loaded the policy during early-boot.

It is also worth noting that the "/sys/fs/selinux/disable" file is
always writable now, regardless of the Kconfig settings, but writing
to the file has no effect on the system, other than to display an
error on the console if a non-zero/true value is written.

Finally, in the several years where we have been working on
deprecating this functionality, there has only been one instance of
someone mentioning any user visible breakage.  In this particular
case it was an individual's kernel test system, and the workaround
documented in the deprecation notice ("selinux=0" on the kernel
command line) resolved the issue without problem.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-20 12:34:23 -04:00
Paul Moore
a7e4676e8e selinux: remove the 'checkreqprot' functionality
We originally promised that the SELinux 'checkreqprot' functionality
would be removed no sooner than June 2021, and now that it is March
2023 it seems like it is a good time to do the final removal.  The
deprecation notice in the kernel provides plenty of detail on why
'checkreqprot' is not desirable, with the key point repeated below:

  This was a compatibility mechanism for legacy userspace and
  for the READ_IMPLIES_EXEC personality flag.  However, if set to
  1, it weakens security by allowing mappings to be made executable
  without authorization by policy.  The default value of checkreqprot
  at boot was changed starting in Linux v4.4 to 0 (i.e. check the
  actual protection), and Android and Linux distributions have been
  explicitly writing a "0" to /sys/fs/selinux/checkreqprot during
  initialization for some time.

Along with the official deprecation notice, we have been discussing
this on-list and directly with several of the larger SELinux-based
distros and everyone is happy to see this feature finally removed.
In an attempt to catch all of the smaller, and DIY, Linux systems
we have been writing a deprecation notice URL into the kernel log,
along with a growing ssleep() penalty, when admins enabled
checkreqprot at runtime or via the kernel command line.  We have
yet to have anyone come to us and raise an objection to the
deprecation or planned removal.

It is worth noting that while this patch removes the checkreqprot
functionality, it leaves the user visible interfaces (kernel command
line and selinuxfs file) intact, just inert.  This should help
prevent breakages with existing userspace tools that correctly, but
unnecessarily, disable checkreqprot at boot or runtime.  Admins
that attempt to enable checkreqprot will be met with a removal
message in the kernel log.

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-20 12:33:50 -04:00
Stephen Smalley
e67b79850f selinux: stop passing selinux_state pointers and their offspring
Linus observed that the pervasive passing of selinux_state pointers
introduced by me in commit aa8e712cee ("selinux: wrap global selinux
state") adds overhead and complexity without providing any
benefit. The original idea was to pave the way for SELinux namespaces
but those have not yet been implemented and there isn't currently
a concrete plan to do so. Remove the passing of the selinux_state
pointers, reverting to direct use of the single global selinux_state,
and likewise remove passing of child pointers like the selinux_avc.
The selinux_policy pointer remains as it is needed for atomic switching
of policies.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202303101057.mZ3Gv5fK-lkp@intel.com/
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-14 15:22:45 -04:00
Christian Brauner
01beba7957
fs: port inode_owner_or_capable() to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:29 +01:00
Christian Brauner
700b794052
fs: port acl to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
39f60c1cce
fs: port xattr to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
4609e1f18e
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Linus Torvalds
c76ff350bd lsm/stable-6.2 PR 20221212
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmOXmxkUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMPXg//cxfYC8lRtVpuGNCZWDietSiHzpzu
 +qFntaTplvybJMQX0HfgNee5cTBZM+W5mp1BHRcZInvV5LRhyrVtgsxDBifutE4x
 LyUJAw5SkiPdRC+XLDIRLKiZCobFBLVs2zO+qibIqsyR60pFjU6WXBLbJfidXBFR
 yWudDbLU0YhQJCHdNHNqnHCgqrEculxn6q3QPvm/DX0xzBwkFHSSYBkGNvHW2ZTA
 lKNreEOwEk5DTLIKjP4bJ72ixp0xbshw5CXuxtwB/12/4h8QbWbJVQLlIeZrTLmp
 zQXQLJ3pCqKJ2OUCgMDK+wmkvLezd80BV3Due7KX0pT0YRDygoh5QEpZ5/8k8eG7
 prxToh2gJWk2htfJF6kgMpAh9Jqewcke4BysbYVM/427OPZYwQqLDZDGOzbtT6pl
 FYF+adN9wwkAErnHnPlzYipUEpBWurbjtsV8KFWNERoZ4YmzfSPEisRqGIHDGRws
 bTyq/7qs5FXkb1zULELj8V+S2ULsmxPqsxJ63p9di54Uo9lHK0I+0IUtajGDdfze
 psAasa9DD/oH2PAbSmpQ5Xo9XyfHRXsVuz1twEmEA14ML0m4wHbNWVHaK0aaXVdG
 kJKSDSjMsiV+GiwNo7ISJ4pVdUpnMI/iZSghFfV28cJslNhJDeaREHaE/Wtn1/xF
 /bCVmEfS16UoJsQ=
 =klFk
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull lsm updates from Paul Moore:

 - Improve the error handling in the device cgroup such that memory
   allocation failures when updating the access policy do not
   potentially alter the policy.

 - Some minor fixes to reiserfs to ensure that it properly releases
   LSM-related xattr values.

 - Update the security_socket_getpeersec_stream() LSM hook to take
   sockptr_t values.

   Previously the net/BPF folks updated the getsockopt code in the
   network stack to leverage the sockptr_t type to make it easier to
   pass both kernel and __user pointers, but unfortunately when they did
   so they didn't convert the LSM hook.

   While there was/is no immediate risk by not converting the LSM hook,
   it seems like this is a mistake waiting to happen so this patch
   proactively does the LSM hook conversion.

 - Convert vfs_getxattr_alloc() to return an int instead of a ssize_t
   and cleanup the callers. Internally the function was never going to
   return anything larger than an int and the callers were doing some
   very odd things casting the return value; this patch fixes all that
   and helps bring a bit of sanity to vfs_getxattr_alloc() and its
   callers.

 - More verbose, and helpful, LSM debug output when the system is booted
   with "lsm.debug" on the command line. There are examples in the
   commit description, but the quick summary is that this patch provides
   better information about which LSMs are enabled and the ordering in
   which they are processed.

 - General comment and kernel-doc fixes and cleanups.

* tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lsm: Fix description of fs_context_parse_param
  lsm: Add/fix return values in lsm_hooks.h and fix formatting
  lsm: Clarify documentation of vm_enough_memory hook
  reiserfs: Add missing calls to reiserfs_security_free()
  lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths
  device_cgroup: Roll back to original exceptions after copy failure
  LSM: Better reporting of actual LSMs at boot
  lsm: make security_socket_getpeersec_stream() sockptr_t safe
  audit: Fix some kernel-doc warnings
  lsm: remove obsoleted comments for security hooks
  fs: edit a comment made in bad taste
2022-12-13 09:47:48 -08:00
Paul Moore
b10b9c342f lsm: make security_socket_getpeersec_stream() sockptr_t safe
Commit 4ff09db1b7 ("bpf: net: Change sk_getsockopt() to take the
sockptr_t argument") made it possible to call sk_getsockopt()
with both user and kernel address space buffers through the use of
the sockptr_t type.  Unfortunately at the time of conversion the
security_socket_getpeersec_stream() LSM hook was written to only
accept userspace buffers, and in a desire to avoid having to change
the LSM hook the commit author simply passed the sockptr_t's
userspace buffer pointer.  Since the only sk_getsockopt() callers
at the time of conversion which used kernel sockptr_t buffers did
not allow SO_PEERSEC, and hence the
security_socket_getpeersec_stream() hook, this was acceptable but
also very fragile as future changes presented the possibility of
silently passing kernel space pointers to the LSM hook.

There are several ways to protect against this, including careful
code review of future commits, but since relying on code review to
catch bugs is a recipe for disaster and the upstream eBPF maintainer
is "strongly against defensive programming", this patch updates the
LSM hook, and all of the implementations to support sockptr_t and
safely handle both user and kernel space buffers.

Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-04 23:25:30 -04:00
Christian Brauner
1bdeb21862
selinux: implement get, set and remove acl hook
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].

So far posix acls were passed as a void blob to the security and
integrity modules. Some of them like evm then proceed to interpret the
void pointer and convert it into the kernel internal struct posix acl
representation to perform their integrity checking magic. This is
obviously pretty problematic as that requires knowledge that only the
vfs is guaranteed to have and has lead to various bugs. Add a proper
security hook for setting posix acls and pass down the posix acls in
their appropriate vfs format instead of hacking it through a void
pointer stored in the uapi format.

I spent considerate time in the security module infrastructure and
audited all codepaths. SELinux has no restrictions based on the posix
acl values passed through it. The capability hook doesn't need to be
called either because it only has restrictions on security.* xattrs. So
these are all fairly simply hooks for SELinux.

Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-20 10:13:28 +02:00
Linus Torvalds
4c0ed7d8d6 whack-a-mole: constifying struct path *
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYzxmRQAKCRBZ7Krx/gZQ
 6+/kAQD2xyf+i4zOYVBr1NB3qBbhVS1zrni1NbC/kT3dJPgTvwEA7z7eqwnrN4zg
 scKFP8a3yPoaQBfs4do5PolhuSr2ngA=
 =NBI+
 -----END PGP SIGNATURE-----

Merge tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs constification updates from Al Viro:
 "whack-a-mole: constifying struct path *"

* tag 'pull-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ecryptfs: constify path
  spufs: constify path
  nd_jump_link(): constify path
  audit_init_parent(): constify path
  __io_setxattr(): constify path
  do_proc_readlink(): constify path
  overlayfs: constify path
  fs/notify: constify path
  may_linkat(): constify path
  do_sys_name_to_handle(): constify path
  ->getprocattr(): attribute name is const char *, TYVM...
2022-10-06 17:31:02 -07:00
Linus Torvalds
26b84401da lsm/stable-6.1 PR 20221003
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68YIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOTbA//TR8i+Wy8iswUCmtfmYg91h1uebpl
 /kjNsSmfgivAUTGamr3eN2WRlGhZfkFDPIHa25uybSA6Q+75p4lst83Rt3HDbjkv
 Ga7grCXnHwSDwJoHOSeFh0pojV2u7Zvfmiib2U5hPZEmd3kBw3NCgAJVcSGN80B2
 dct36fzZNXjvpWDbygmFtRRkmEseslSkft8bUVvNZBP+B0zvv3vcNY1QFuKuK+W2
 8wWpvO/cCSmke5i2c2ktHSk2f8/Y6n26Ik/OTHcTVfoKZLRaFbXEzLyxzLrNWd6m
 hujXgcxszTtHdmoXx+J6uBauju7TR8pi1x8mO2LSGrlpRc1cX0A5ED8WcH71+HVE
 8L1fIOmZShccPZn8xRok7oYycAUm/gIfpmSLzmZA76JsZYAe+mp9Ze9FA6fZtSwp
 7Q/rfw/Rlz25WcFBe4xypP078HkOmqutkCk2zy5liR+cWGrgy/WKX15vyC0TaPrX
 tbsRKuCLkipgfXrTk0dX3kmhz+3bJYjqeZEt7sfPSZYpaOGkNXVmAW0wnCOTuLMU
 +8pIVktvQxMmACEj2gBMz11iooR4DpWLxOcQQR/impgCpNdZ60nA0a6KPJoIXC+5
 NfTa422FZkc99QRVblUZyWSgJBW78Z3ZAQcQlo1AGLlFydbfrSFTRLbmNJZo/Nkl
 KwpGvWs5nB0rVw0=
 =VZl5
 -----END PGP SIGNATURE-----

Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

Pull LSM updates from Paul Moore:
 "Seven patches for the LSM layer and we've got a mix of trivial and
  significant patches. Highlights below, starting with the smaller bits
  first so they don't get lost in the discussion of the larger items:

   - Remove some redundant NULL pointer checks in the common LSM audit
     code.

   - Ratelimit the lockdown LSM's access denial messages.

     With this change there is a chance that the last visible lockdown
     message on the console is outdated/old, but it does help preserve
     the initial series of lockdown denials that started the denial
     message flood and my gut feeling is that these might be the more
     valuable messages.

   - Open userfaultfds as readonly instead of read/write.

     While this code obviously lives outside the LSM, it does have a
     noticeable impact on the LSMs with Ondrej explaining the situation
     in the commit description. It is worth noting that this patch
     languished on the VFS list for over a year without any comments
     (objections or otherwise) so I took the liberty of pulling it into
     the LSM tree after giving fair notice. It has been in linux-next
     since the end of August without any noticeable problems.

   - Add a LSM hook for user namespace creation, with implementations
     for both the BPF LSM and SELinux.

     Even though the changes are fairly small, this is the bulk of the
     diffstat as we are also including BPF LSM selftests for the new
     hook.

     It's also the most contentious of the changes in this pull request
     with Eric Biederman NACK'ing the LSM hook multiple times during its
     development and discussion upstream. While I've never taken NACK's
     lightly, I'm sending these patches to you because it is my belief
     that they are of good quality, satisfy a long-standing need of
     users and distros, and are in keeping with the existing nature of
     the LSM layer and the Linux Kernel as a whole.

     The patches in implement a LSM hook for user namespace creation
     that allows for a granular approach, configurable at runtime, which
     enables both monitoring and control of user namespaces. The general
     consensus has been that this is far preferable to the other
     solutions that have been adopted downstream including outright
     removal from the kernel, disabling via system wide sysctls, or
     various other out-of-tree mechanisms that users have been forced to
     adopt since we haven't been able to provide them an upstream
     solution for their requests. Eric has been steadfast in his
     objections to this LSM hook, explaining that any restrictions on
     the user namespace could have significant impact on userspace.
     While there is the possibility of impacting userspace, it is
     important to note that this solution only impacts userspace when it
     is requested based on the runtime configuration supplied by the
     distro/admin/user. Frederick (the pathset author), the LSM/security
     community, and myself have tried to work with Eric during
     development of this patchset to find a mutually acceptable
     solution, but Eric's approach and unwillingness to engage in a
     meaningful way have made this impossible. I have CC'd Eric directly
     on this pull request so he has a chance to provide his side of the
     story; there have been no objections outside of Eric's"

* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  lockdown: ratelimit denial messages
  userfaultfd: open userfaultfds with O_RDONLY
  selinux: Implement userns_create hook
  selftests/bpf: Add tests verifying bpf lsm userns_create hook
  bpf-lsm: Make bpf_lsm_userns_create() sleepable
  security, lsm: Introduce security_create_user_ns()
  lsm: clean up redundant NULL pointer check
2022-10-03 17:51:52 -07:00
Linus Torvalds
e816da29bc selinux/stable-6.1 PR 20221003
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmM68ZsUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOAtRAAw/lcyPoyN8ia6+PPihRtAKGUFIf5
 +IdEPYfCqkGghqB7BRDl5bXOLFgpY/m/41g+xFvzJ0fhVPLa7UWB//N7yTu3OnW/
 vXz1wn0EJAeDlLbPzWd6V/SpcxJ1WPzjHj2B3YXNWnukfMjCnPIA8XlZc18zAWS1
 /OOEBoOo/a/8Giw2l1bEXxfmDI20NrXNL3vWKQ+Bbhg2PJaH/FTk4DNxopt84o28
 vA+cbfQcOOjeRjBuncnTp9/b244ojeM+lRSJZozGTogFIeDUp3KW1D7NHqNwyX12
 seDooqLEP25vP+kQh8zH7gvacpoeDLz40bSpd+MKKj02IxKGikykWuvtlFWY3xNB
 o1mT4SJhh3JcewS7gh6P5aESSSgLg9zb3zMGtjHhtz+HHi/Sq7PK7xJgrnKOBNgu
 CLIu3L+5vJpAgrsze2tIcwRUySIzDKnfgw8Oz7zaS2lOTJ58emz00QwEioHMQufK
 8gZXTvZykJAtLF19PJw+mHKu38hbdD/4vt8AFuIgJzFkjWKzaZAxUBT+3p/uaLHG
 2PegjKzpCqH9vZ/HCdYI42OB8TKiPU3eBtYZ2eP3h7cdDu++tp1rf0hwHQrwE2AD
 PRuoCaBYOTUedbR8CV07fSSGFnZvlPnuk9yB7/eztV2thBQG28ALGxVhWadn4ap/
 UIFgCs5QDRj11u8=
 =BQ+i
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
 "Six SELinux patches, all are simple and easily understood, but a list
  of the highlights is below:

   - Use 'grep -E' instead of 'egrep' in the SELinux policy install
     script.

     Fun fact, this seems to be GregKH's *second* dedicated SELinux
     patch since we transitioned to git (ignoring merges, the SPDX
     stuff, and a trivial fs reference removal when lustre was yanked);
     the first was back in 2011 when selinuxfs was placed in
     /sys/fs/selinux. Oh, the memories ...

   - Convert the SELinux policy boolean values to use signed integer
     types throughout the SELinux kernel code.

     Prior to this we were using a mix of signed and unsigned integers
     which was probably okay in this particular case, but it is
     definitely not a good idea in general.

   - Remove a reference to the SELinux runtime disable functionality in
     /etc/selinux/config as we are in the process of deprecating that.

     See [1] for more background on this if you missed the previous
     notes on the deprecation.

   - Minor cleanups: remove unneeded variables and function parameter
     constification"

Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable [1]

* tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: remove runtime disable message in the install_policy.sh script
  selinux: use "grep -E" instead of "egrep"
  selinux: remove the unneeded result variable
  selinux: declare read-only parameters const
  selinux: use int arrays for boolean values
  selinux: remove an unneeded variable in sel_make_class_dir_entries()
2022-10-03 17:45:15 -07:00
Xu Panda
09b71adab0 selinux: remove the unneeded result variable
Return the value avc_has_perm() directly instead of storing it in
another redundant variable.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Xu Panda <xu.panda@zte.com.cn>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-09-14 07:47:27 -04:00
Al Viro
c8e477c649 ->getprocattr(): attribute name is const char *, TYVM...
cast of ->d_name.name to char * is completely wrong - nothing is
allowed to modify its contents.

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-01 17:34:39 -04:00
Paul Moore
f4d653dcaa selinux: implement the security_uring_cmd() LSM hook
Add a SELinux access control for the iouring IORING_OP_URING_CMD
command.  This includes the addition of a new permission in the
existing "io_uring" object class: "cmd".  The subject of the new
permission check is the domain of the process requesting access, the
object is the open file which points to the device/file that is the
target of the IORING_OP_URING_CMD operation.  A sample policy rule
is shown below:

  allow <domain> <file>:io_uring { cmd };

Cc: stable@vger.kernel.org
Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-26 11:19:43 -04:00
Frederick Lawler
ed5d44d42c selinux: Implement userns_create hook
Unprivileged user namespace creation is an intended feature to enable
sandboxing, however this feature is often used to as an initial step to
perform a privilege escalation attack.

This patch implements a new user_namespace { create } access control
permission to restrict which domains allow or deny user namespace
creation. This is necessary for system administrators to quickly protect
their systems while waiting for vulnerability patches to be applied.

This permission can be used in the following way:

        allow domA_t domA_t : user_namespace { create };

Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-08-16 17:44:44 -04:00
Linus Torvalds
79802ada87 selinux/stable-6.0 PR 20220801
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmLoEeIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNSOhAAwWwRcmcHnk+k2agT9QjKrLo26NCO
 MQLE89o4y2ChEFHxC7F7SKoQRxtfYa323p1vmlGzKrlB+IZ6oqERVp4QNQQbXsfn
 n9VvVpxjRNHAetcRhCM9ZOchWjUdw6AMaJ8e3fdRNRESadAUUFDxifw1wpjgG9+i
 LmtDbfZ7vLs2grTf9OZy3JIl1VF3lVRUTI7ZBQggfJncMa+LXNWdVNmEe3yfyboA
 1MwpSao7K2si0hBGAQo/UGQz4b19Tm4xMg8bSy7oTsP5Lae5ciPkeI3qazvs9usp
 WScZYhQ8NugqLbDbjs7dm6QCpj4x3dUs6ei48LKe3GF2mcGesFfOPo9sNHao4kKv
 C9t0f9qw+EhGvnNL7uQIDDf8OuTjuLWDvZSrMLID/IJKFF5NJ3y+XzaS9aPM3VEY
 qyOsX+cEzheXGhD6xE1sCo+AyPUDYqNDMIKBj2wlIGCKlzDGa8RT6VsQuvgf3c3K
 43CnRCQeWDWOHCq3MnRe/fmYtW+JB7tsXiKAq4OJADacwPP36bsP3bqU8AlWYwDt
 tnuMa+LKusHnMEQpMPI8FW8qGdxwGSen+mymfLFIMgtwNGkV7WGRJ6Lbyn0SaR6v
 HyXgZASIOQRnamK3yZCDpxo0K81IVxPWJIjHyg53znqT5TCpXccPyV4HwbJKI/KG
 8PtHrXOdPOGCZ2g=
 =WWq1
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "A relatively small set of patches for SELinux this time, eight patches
  in total with really only one significant change.

  The highlights are:

   - Add support for proper labeling of memfd_secret anonymous inodes.

     This will allow LSMs that implement the anonymous inode hooks to
     apply security policy to memfd_secret() fds.

   - Various small improvements to memory management: fixed leaks, freed
     memory when needed, boundary checks.

   - Hardened the selinux_audit_data struct with __randomize_layout.

   - A minor documentation tweak to fix a formatting/style issue"

* tag 'selinux-pr-20220801' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: selinux_add_opt() callers free memory
  selinux: Add boundary check in put_entry()
  selinux: fix memleak in security_read_state_kernel()
  docs: selinux: add '=' signs to kernel boot options
  mm: create security context for memfd_secret inodes
  selinux: fix typos in comments
  selinux: drop unnecessary NULL check
  selinux: add __randomize_layout to selinux_audit_data
2022-08-02 14:51:47 -07:00
Xiu Jianfeng
ef54ccb616 selinux: selinux_add_opt() callers free memory
The selinux_add_opt() function may need to allocate memory for the
mount options if none has already been allocated, but there is no
need to free that memory on error as the callers handle that.  Drop
the existing kfree() on error to help increase consistency in the
selinux_add_opt() error handling.

This patch also changes selinux_add_opt() to return -EINVAL when
the mount option value, @s, is NULL.  It currently return -ENOMEM.

Link: https://lore.kernel.org/lkml/20220611090550.135674-1-xiujianfeng@huawei.com/T/
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
[PM: fix subject, rework commit description language]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-06-20 21:05:40 -04:00
Christian Göttsche
cad140d008 selinux: free contexts previously transferred in selinux_add_opt()
`selinux_add_opt()` stopped taking ownership of the passed context since
commit 70f4169ab4 ("selinux: parse contexts for mount options early").

    unreferenced object 0xffff888114dfd140 (size 64):
      comm "mount", pid 15182, jiffies 4295687028 (age 796.340s)
      hex dump (first 32 bytes):
        73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f  system_u:object_
        72 3a 74 65 73 74 5f 66 69 6c 65 73 79 73 74 65  r:test_filesyste
      backtrace:
        [<ffffffffa07dbef4>] kmemdup_nul+0x24/0x80
        [<ffffffffa0d34253>] selinux_sb_eat_lsm_opts+0x293/0x560
        [<ffffffffa0d13f08>] security_sb_eat_lsm_opts+0x58/0x80
        [<ffffffffa0af1eb2>] generic_parse_monolithic+0x82/0x180
        [<ffffffffa0a9c1a5>] do_new_mount+0x1f5/0x550
        [<ffffffffa0a9eccb>] path_mount+0x2ab/0x1570
        [<ffffffffa0aa019e>] __x64_sys_mount+0x20e/0x280
        [<ffffffffa1f47124>] do_syscall_64+0x34/0x80
        [<ffffffffa200007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

    unreferenced object 0xffff888108e71640 (size 64):
      comm "fsmount", pid 7607, jiffies 4295044974 (age 1601.016s)
      hex dump (first 32 bytes):
        73 79 73 74 65 6d 5f 75 3a 6f 62 6a 65 63 74 5f  system_u:object_
        72 3a 74 65 73 74 5f 66 69 6c 65 73 79 73 74 65  r:test_filesyste
      backtrace:
        [<ffffffff861dc2b1>] memdup_user+0x21/0x90
        [<ffffffff861dc367>] strndup_user+0x47/0xa0
        [<ffffffff864f6965>] __do_sys_fsconfig+0x485/0x9f0
        [<ffffffff87940124>] do_syscall_64+0x34/0x80
        [<ffffffff87a0007e>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

Cc: stable@vger.kernel.org
Fixes: 70f4169ab4 ("selinux: parse contexts for mount options early")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-06-15 21:20:45 -04:00
Jonas Lindner
9691e4f9ba selinux: fix typos in comments
Signed-off-by: Jonas Lindner <jolindner@gmx.de>
[PM: fixed duplicated subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-06-10 15:49:15 -04:00
Christian Göttsche
4d3d0ed60e selinux: drop unnecessary NULL check
Commit e3489f8974 ("selinux: kill selinux_sb_get_mnt_opts()")
introduced a NULL check on the context after a successful call to
security_sid_to_context().  This is on the one hand redundant after
checking for success and on the other hand insufficient on an actual
NULL pointer, since the context is passed to seq_escape() leading to a
call of strlen() on it.

Reported by Clang analyzer:

    In file included from security/selinux/hooks.c:28:
    In file included from ./include/linux/tracehook.h:50:
    In file included from ./include/linux/memcontrol.h:13:
    In file included from ./include/linux/cgroup.h:18:
    ./include/linux/seq_file.h:136:25: warning: Null pointer passed as 1st argument to string length function [unix.cstring.NullArg]
            seq_escape_mem(m, src, strlen(src), flags, esc);
                                   ^~~~~~~~~~~

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-06-07 17:20:10 -04:00
Christian Göttsche
c29722fad4 selinux: log anon inode class name
Log the anonymous inode class name in the security hook
inode_init_security_anon.  This name is the key for name based type
transitions on the anon_inode security class on creation.  Example:

    type=AVC msg=audit(02/16/22 22:02:50.585:216) : avc:  granted \
        { create } for  pid=2136 comm=mariadbd anonclass=[io_uring] \
        scontext=system_u:system_r:mysqld_t:s0 \
        tcontext=system_u:system_r:mysqld_iouring_t:s0 tclass=anon_inode

Add a new LSM audit data type holding the inode and the class name.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: adjusted 'anonclass' to be a trusted string, cgzones approved]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-05-03 16:09:03 -04:00
Paul Moore
81200b0265 selinux: checkreqprot is deprecated, add some ssleep() discomfort
The checkreqprot functionality was disabled by default back in
Linux v4.4 (2015) with commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and it was
officially marked as deprecated in Linux v5.7.  It was always a
bit of a hack to workaround very old userspace and to the best of
our knowledge, the checkreqprot functionality has been disabled by
Linux distributions for quite some time.

This patch moves the deprecation messages from KERN_WARNING to
KERN_ERR and adds a five second sleep to anyone using it to help
draw their attention to the deprecation and provide a URL which
helps explain things in more detail.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-04-04 16:24:22 -04:00
Michal Orzel
0a9876f36b selinux: Remove redundant assignments
Get rid of redundant assignments which end up in values not being
read either because they are overwritten or the function ends.

Reported by clang-tidy [deadcode.DeadStores]

Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-04-04 15:38:53 -04:00
Linus Torvalds
1930a6e739 ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of
 the ptrace fields inside of siglock to remove races, adds a missing
 permission check to ptrace.c
 
 The removal of tracehook.h is quite significant as it has been a major
 source of confusion in recent years.  Much of that confusion was
 around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled
 making the semantics clearer).
 
 For people who don't know tracehook.h is a vestiage of an attempt to
 implement uprobes like functionality that was never fully merged, and
 was later superseeded by uprobes when uprobes was merged.  For many
 years now we have been removing what tracehook functionaly a little
 bit at a time.  To the point where now anything left in tracehook.h is
 some weird strange thing that is difficult to understand.
 
 Eric W. Biederman (15):
       ptrace: Move ptrace_report_syscall into ptrace.h
       ptrace/arm: Rename tracehook_report_syscall report_syscall
       ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
       ptrace: Remove arch_syscall_{enter,exit}_tracehook
       ptrace: Remove tracehook_signal_handler
       task_work: Remove unnecessary include from posix_timers.h
       task_work: Introduce task_work_pending
       task_work: Call tracehook_notify_signal from get_signal on all architectures
       task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
       signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
       resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
       resume_user_mode: Move to resume_user_mode.h
       tracehook: Remove tracehook.h
       ptrace: Move setting/clearing ptrace_message into ptrace_stop
       ptrace: Return the signal to continue with from ptrace_stop
 
 Jann Horn (1):
       ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
 
 Yang Li (1):
       ptrace: Remove duplicated include in ptrace.c
 
  MAINTAINERS                          |   1 -
  arch/Kconfig                         |   5 +-
  arch/alpha/kernel/ptrace.c           |   5 +-
  arch/alpha/kernel/signal.c           |   4 +-
  arch/arc/kernel/ptrace.c             |   5 +-
  arch/arc/kernel/signal.c             |   4 +-
  arch/arm/kernel/ptrace.c             |  12 +-
  arch/arm/kernel/signal.c             |   4 +-
  arch/arm64/kernel/ptrace.c           |  14 +--
  arch/arm64/kernel/signal.c           |   4 +-
  arch/csky/kernel/ptrace.c            |   5 +-
  arch/csky/kernel/signal.c            |   4 +-
  arch/h8300/kernel/ptrace.c           |   5 +-
  arch/h8300/kernel/signal.c           |   4 +-
  arch/hexagon/kernel/process.c        |   4 +-
  arch/hexagon/kernel/signal.c         |   1 -
  arch/hexagon/kernel/traps.c          |   6 +-
  arch/ia64/kernel/process.c           |   4 +-
  arch/ia64/kernel/ptrace.c            |   6 +-
  arch/ia64/kernel/signal.c            |   1 -
  arch/m68k/kernel/ptrace.c            |   5 +-
  arch/m68k/kernel/signal.c            |   4 +-
  arch/microblaze/kernel/ptrace.c      |   5 +-
  arch/microblaze/kernel/signal.c      |   4 +-
  arch/mips/kernel/ptrace.c            |   5 +-
  arch/mips/kernel/signal.c            |   4 +-
  arch/nds32/include/asm/syscall.h     |   2 +-
  arch/nds32/kernel/ptrace.c           |   5 +-
  arch/nds32/kernel/signal.c           |   4 +-
  arch/nios2/kernel/ptrace.c           |   5 +-
  arch/nios2/kernel/signal.c           |   4 +-
  arch/openrisc/kernel/ptrace.c        |   5 +-
  arch/openrisc/kernel/signal.c        |   4 +-
  arch/parisc/kernel/ptrace.c          |   7 +-
  arch/parisc/kernel/signal.c          |   4 +-
  arch/powerpc/kernel/ptrace/ptrace.c  |   8 +-
  arch/powerpc/kernel/signal.c         |   4 +-
  arch/riscv/kernel/ptrace.c           |   5 +-
  arch/riscv/kernel/signal.c           |   4 +-
  arch/s390/include/asm/entry-common.h |   1 -
  arch/s390/kernel/ptrace.c            |   1 -
  arch/s390/kernel/signal.c            |   5 +-
  arch/sh/kernel/ptrace_32.c           |   5 +-
  arch/sh/kernel/signal_32.c           |   4 +-
  arch/sparc/kernel/ptrace_32.c        |   5 +-
  arch/sparc/kernel/ptrace_64.c        |   5 +-
  arch/sparc/kernel/signal32.c         |   1 -
  arch/sparc/kernel/signal_32.c        |   4 +-
  arch/sparc/kernel/signal_64.c        |   4 +-
  arch/um/kernel/process.c             |   4 +-
  arch/um/kernel/ptrace.c              |   5 +-
  arch/x86/kernel/ptrace.c             |   1 -
  arch/x86/kernel/signal.c             |   5 +-
  arch/x86/mm/tlb.c                    |   1 +
  arch/xtensa/kernel/ptrace.c          |   5 +-
  arch/xtensa/kernel/signal.c          |   4 +-
  block/blk-cgroup.c                   |   2 +-
  fs/coredump.c                        |   1 -
  fs/exec.c                            |   1 -
  fs/io-wq.c                           |   6 +-
  fs/io_uring.c                        |  11 +-
  fs/proc/array.c                      |   1 -
  fs/proc/base.c                       |   1 -
  include/asm-generic/syscall.h        |   2 +-
  include/linux/entry-common.h         |  47 +-------
  include/linux/entry-kvm.h            |   2 +-
  include/linux/posix-timers.h         |   1 -
  include/linux/ptrace.h               |  81 ++++++++++++-
  include/linux/resume_user_mode.h     |  64 ++++++++++
  include/linux/sched/signal.h         |  17 +++
  include/linux/task_work.h            |   5 +
  include/linux/tracehook.h            | 226 -----------------------------------
  include/uapi/linux/ptrace.h          |   2 +-
  kernel/entry/common.c                |  19 +--
  kernel/entry/kvm.c                   |   9 +-
  kernel/exit.c                        |   3 +-
  kernel/livepatch/transition.c        |   1 -
  kernel/ptrace.c                      |  47 +++++---
  kernel/seccomp.c                     |   1 -
  kernel/signal.c                      |  62 +++++-----
  kernel/task_work.c                   |   4 +-
  kernel/time/posix-cpu-timers.c       |   1 +
  mm/memcontrol.c                      |   2 +-
  security/apparmor/domain.c           |   1 -
  security/selinux/hooks.c             |   1 -
  85 files changed, 372 insertions(+), 495 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj
 j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR
 Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw
 Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU
 DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4
 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp
 S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY
 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K
 /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ
 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0
 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr
 disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY=
 =uEro
 -----END PGP SIGNATURE-----

Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull ptrace cleanups from Eric Biederman:
 "This set of changes removes tracehook.h, moves modification of all of
  the ptrace fields inside of siglock to remove races, adds a missing
  permission check to ptrace.c

  The removal of tracehook.h is quite significant as it has been a major
  source of confusion in recent years. Much of that confusion was around
  task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the
  semantics clearer).

  For people who don't know tracehook.h is a vestiage of an attempt to
  implement uprobes like functionality that was never fully merged, and
  was later superseeded by uprobes when uprobes was merged. For many
  years now we have been removing what tracehook functionaly a little
  bit at a time. To the point where anything left in tracehook.h was
  some weird strange thing that was difficult to understand"

* tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ptrace: Remove duplicated include in ptrace.c
  ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE
  ptrace: Return the signal to continue with from ptrace_stop
  ptrace: Move setting/clearing ptrace_message into ptrace_stop
  tracehook: Remove tracehook.h
  resume_user_mode: Move to resume_user_mode.h
  resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume
  signal: Move set_notify_signal and clear_notify_signal into sched/signal.h
  task_work: Decouple TIF_NOTIFY_SIGNAL and task_work
  task_work: Call tracehook_notify_signal from get_signal on all architectures
  task_work: Introduce task_work_pending
  task_work: Remove unnecessary include from posix_timers.h
  ptrace: Remove tracehook_signal_handler
  ptrace: Remove arch_syscall_{enter,exit}_tracehook
  ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h
  ptrace/arm: Rename tracehook_report_syscall report_syscall
  ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28 17:29:53 -07:00
Eric W. Biederman
355f841a3f tracehook: Remove tracehook.h
Now that all of the definitions have moved out of tracehook.h into
ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in
tracehook.h so remove it.

Update the few files that were depending upon tracehook.h to bring in
definitions to use the headers they need directly.

Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-03-10 16:51:51 -06:00
Richard Haines
65881e1db4 selinux: allow FIOCLEX and FIONCLEX with policy capability
These ioctls are equivalent to fcntl(fd, F_SETFD, flags), which SELinux
always allows too.  Furthermore, a failed FIOCLEX could result in a file
descriptor being leaked to a process that should not have access to it.

As this patch removes access controls, a policy capability needs to be
enabled in policy to always allow these ioctls.

Based-on-patch-by: Demi Marie Obenour <demiobenour@gmail.com>
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-25 15:35:19 -05:00
Christian Göttsche
5ea33af9d4 selinux: drop return statement at end of void functions
Those return statements at the end of a void function are redundant.

Reported by clang-tidy [readability-redundant-control-flow]

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-18 10:42:12 -05:00
Ondrej Mosnacek
3eb8eaf2ca security: implement sctp_assoc_established hook in selinux
Do this by extracting the peer labeling per-association logic from
selinux_sctp_assoc_request() into a new helper
selinux_sctp_process_new_assoc() and use this helper in both
selinux_sctp_assoc_request() and selinux_sctp_assoc_established(). This
ensures that the peer labeling behavior as documented in
Documentation/security/SCTP.rst is applied both on the client and server
side:
"""
An SCTP socket will only have one peer label assigned to it. This will be
assigned during the establishment of the first association. Any further
associations on this socket will have their packet peer label compared to
the sockets peer label, and only if they are different will the
``association`` permission be validated. This is validated by checking the
socket peer sid against the received packets peer sid to determine whether
the association should be allowed or denied.
"""

At the same time, it also ensures that the peer label of the association
is set to the correct value, such that if it is peeled off into a new
socket, the socket's peer label  will then be set to the association's
peer label, same as it already works on the server side.

While selinux_inet_conn_established() (which we are replacing by
selinux_sctp_assoc_established() for SCTP) only deals with assigning a
peer label to the connection (socket), in case of SCTP we need to also
copy the (local) socket label to the association, so that
selinux_sctp_sk_clone() can then pick it up for the new socket in case
of SCTP peeloff.

Careful readers will notice that the selinux_sctp_process_new_assoc()
helper also includes the "IPv4 packet received over an IPv6 socket"
check, even though it hadn't been in selinux_sctp_assoc_request()
before. While such check is not necessary in
selinux_inet_conn_request() (because struct request_sock's family field
is already set according to the skb's family), here it is needed, as we
don't have request_sock and we take the initial family from the socket.
In selinux_sctp_assoc_established() it is similarly needed as well (and
also selinux_inet_conn_established() already has it).

Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Based-on-patch-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-15 15:06:32 -05:00
Ondrej Mosnacek
70f4169ab4 selinux: parse contexts for mount options early
Commit b8b87fd954 ("selinux: Fix selinux_sb_mnt_opts_compat()")
started to parse mount options into SIDs in selinux_add_opt() if policy
has already been loaded. Since it's extremely unlikely that anyone would
depend on the ability to set SELinux contexts on fs_context before
loading the policy and then mounting that context after simplify the
logic by always parsing the options early.

Note that the multi-step mounting is only possible with the new
fscontext mount API and wasn't possible before its introduction.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-04 13:40:15 -05:00
Paul Moore
0e326df069 selinux: various sparse fixes
When running the SELinux code through sparse, there are a handful of
warnings.  This patch resolves some of these warnings caused by
"__rcu" mismatches.

 % make W=1 C=1 security/selinux/

Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-01 19:08:28 -05:00
Scott Mayhew
6bc1968c14 selinux: try to use preparsed sid before calling parse_sid()
Avoid unnecessary parsing of sids that have already been parsed via
selinux_sb_eat_lsm_opts().

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-01 16:37:17 -05:00
Scott Mayhew
b8b87fd954 selinux: Fix selinux_sb_mnt_opts_compat()
selinux_sb_mnt_opts_compat() is called under the sb_lock spinlock and
shouldn't be performing any memory allocations.  Fix this by parsing the
sids at the same time we're chopping up the security mount options
string and then using the pre-parsed sids when doing the comparison.

Fixes: cc274ae776 ("selinux: fix sleeping function called from invalid context")
Fixes: 69c4a42d72 ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-01 16:21:22 -05:00
Casey Schaufler
ecff30575b LSM: general protection fault in legacy_parse_param
The usual LSM hook "bail on fail" scheme doesn't work for cases where
a security module may return an error code indicating that it does not
recognize an input.  In this particular case Smack sees a mount option
that it recognizes, and returns 0. A call to a BPF hook follows, which
returns -ENOPARAM, which confuses the caller because Smack has processed
its data.

The SELinux hook incorrectly returns 1 on success. There was a time
when this was correct, however the current expectation is that it
return 0 on success. This is repaired.

Reported-by: syzbot+d1e3b1d92d25abf97943@syzkaller.appspotmail.com
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-01-27 20:43:02 -05:00
Paul Moore
cdeea45422 selinux: fix a type cast problem in cred_init_security()
In the process of removing an explicit type cast to preserve a cred
const qualifier in cred_init_security() we ran into a problem where
the task_struct::real_cred field is defined with the "__rcu"
attribute but the selinux_cred() function parameter is not, leading
to a sparse warning:

  security/selinux/hooks.c:216:36: sparse: sparse:
    incorrect type in argument 1 (different address spaces)
    @@     expected struct cred const *cred
    @@     got struct cred const [noderef] __rcu *real_cred

As we don't want to add the "__rcu" attribute to the selinux_cred()
parameter, we're going to add an explicit cast back to
cred_init_security().

Fixes: b084e189b0 ("selinux: simplify cred_init_security")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-01-27 12:52:43 -05:00
Christian Göttsche
b084e189b0 selinux: simplify cred_init_security
The parameter of selinux_cred() is declared const, so an explicit cast
dropping the const qualifier is not necessary. Without the cast the
local variable cred serves no purpose.

Reported by clang [-Wcast-qual]

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-01-26 15:57:39 -05:00
GONG, Ruiqi
0266c25e7c selinux: access superblock_security_struct in LSM blob way
LSM blob has been involved for superblock's security struct. So fix the
remaining direct access to sb->s_security by using the LSM blob
mechanism.

Fixes: 08abe46b2c ("selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support")
Fixes: 69c4a42d72 ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-01-25 18:46:04 -05:00
Linus Torvalds
a135ce4400 selinux/stable-5.17 PR 20220110
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmHcekAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNMAQ//TlHhQPEkPYZwD0EPXQIkl5IxGZfR
 DSg24OMzrh+x/jKVYhMXuW6BlnF10wd2ocG0DNAt9weF22PLrbDsxbzAAnv+MLXM
 wCzWLmgKpPpbaG57oa17LMcswDjVr8YbxXPOvyZJ/G3YsWDarH8Ezot8iNlVUkOh
 oqjjXTy0dyLUB/FsW7sIGWa3O18SkbI4pDQxL2vcpdDbvPtAY94kNG3bKTONK/kn
 OxLrjKscTtu6EuSdFhMqwcUxpfqvqUDrEfiSdNruMlT0/DixHgFlJA8WHXjn7Wpq
 XTbqdFrClfpmIiTrPSSszLrZAceegTdefDAf5wgfTxmcfYKiPDF9kPu5UPX1rAgO
 hSoXvGvPjez+RzyXmv9S292lkhV4tz/Wg0YlxZ0LUWif/CSZEyeGGoFZs8080Ukl
 1C/oNrByriaGGRLVwKNGOg5x/UkP1ipnorNnAiiIhw+xkzfXm12RdisUyUgLh9+w
 WsfsC9BJkXMpuvfkgya5PJ677wVNou4ZtJX+2MV8CfGEKTAJp/X/HKh3tWkQFYKx
 35QLVO1LD6gJZlSZZTsjZDxUyPwSd8e55GSFn1qKIHuC2jKjSkwCE7JfErIrI/W0
 js6HKO2ak7oMTWjxRizANyKI/yva/DeMl+mKCC4QV0xmwlm5JsOe7mYSCNGws7AV
 A2qYX7S1xKbLTGI=
 =8H+p
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20220110' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "Nothing too significant, but five SELinux patches for v5.17 that do
  the following:

   - Harden the code through additional use of the struct_size() macro

   - Plug some memory leaks

   - Clean up the code via removal of the security_add_mnt_opt() LSM
     hook and minor tweaks to selinux_add_opt()

   - Rename security_task_getsecid_subj() to better reflect its actual
     behavior/use - now called security_current_getsecid_subj()"

* tag 'selinux-pr-20220110' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: minor tweaks to selinux_add_opt()
  selinux: fix potential memleak in selinux_add_opt()
  security,selinux: remove security_add_mnt_opt()
  selinux: Use struct_size() helper in kmalloc()
  lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()
2022-01-11 13:03:06 -08:00
Tom Rix
732bc2ff08 selinux: initialize proto variable in selinux_ip_postroute_compat()
Clang static analysis reports this warning

hooks.c:5765:6: warning: 4th function call argument is an uninitialized
                value
        if (selinux_xfrm_postroute_last(sksec->sid, skb, &ad, proto))
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

selinux_parse_skb() can return ok without setting proto.  The later call
to selinux_xfrm_postroute_last() does an early check of proto and can
return ok if the garbage proto value matches.  So initialize proto.

Cc: stable@vger.kernel.org
Fixes: eef9b41622 ("selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()")
Signed-off-by: Tom Rix <trix@redhat.com>
[PM: typo/spelling and checkpatch.pl description fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-27 10:41:20 -05:00
Paul Moore
6cd9d4b978 selinux: minor tweaks to selinux_add_opt()
Two minor edits to selinux_add_opt(): use "sizeof(*ptr)" instead of
"sizeof(type)" in the kzalloc() call, and rename the "Einval" jump
target to "err" for the sake of consistency.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-21 15:14:45 -05:00
Bernard Zhao
2e08df3c7c selinux: fix potential memleak in selinux_add_opt()
This patch try to fix potential memleak in error branch.

Fixes: ba64186233 ("selinux: new helper - selinux_add_opt()")
Signed-off-by: Bernard Zhao <bernard@vivo.com>
[PM: tweak the subject line, add Fixes tag]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-21 14:47:35 -05:00
Scott Mayhew
cc274ae776 selinux: fix sleeping function called from invalid context
selinux_sb_mnt_opts_compat() is called via sget_fc() under the sb_lock
spinlock, so it can't use GFP_KERNEL allocations:

[  868.565200] BUG: sleeping function called from invalid context at
               include/linux/sched/mm.h:230
[  868.568246] in_atomic(): 1, irqs_disabled(): 0,
               non_block: 0, pid: 4914, name: mount.nfs
[  868.569626] preempt_count: 1, expected: 0
[  868.570215] RCU nest depth: 0, expected: 0
[  868.570809] Preemption disabled at:
[  868.570810] [<0000000000000000>] 0x0
[  868.571848] CPU: 1 PID: 4914 Comm: mount.nfs Kdump: loaded
               Tainted: G        W         5.16.0-rc5.2585cf9dfa #1
[  868.573273] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
               BIOS 1.14.0-4.fc34 04/01/2014
[  868.574478] Call Trace:
[  868.574844]  <TASK>
[  868.575156]  dump_stack_lvl+0x34/0x44
[  868.575692]  __might_resched.cold+0xd6/0x10f
[  868.576308]  slab_pre_alloc_hook.constprop.0+0x89/0xf0
[  868.577046]  __kmalloc_track_caller+0x72/0x420
[  868.577684]  ? security_context_to_sid_core+0x48/0x2b0
[  868.578569]  kmemdup_nul+0x22/0x50
[  868.579108]  security_context_to_sid_core+0x48/0x2b0
[  868.579854]  ? _nfs4_proc_pathconf+0xff/0x110 [nfsv4]
[  868.580742]  ? nfs_reconfigure+0x80/0x80 [nfs]
[  868.581355]  security_context_str_to_sid+0x36/0x40
[  868.581960]  selinux_sb_mnt_opts_compat+0xb5/0x1e0
[  868.582550]  ? nfs_reconfigure+0x80/0x80 [nfs]
[  868.583098]  security_sb_mnt_opts_compat+0x2a/0x40
[  868.583676]  nfs_compare_super+0x113/0x220 [nfs]
[  868.584249]  ? nfs_try_mount_request+0x210/0x210 [nfs]
[  868.584879]  sget_fc+0xb5/0x2f0
[  868.585267]  nfs_get_tree_common+0x91/0x4a0 [nfs]
[  868.585834]  vfs_get_tree+0x25/0xb0
[  868.586241]  fc_mount+0xe/0x30
[  868.586605]  do_nfs4_mount+0x130/0x380 [nfsv4]
[  868.587160]  nfs4_try_get_tree+0x47/0xb0 [nfsv4]
[  868.587724]  vfs_get_tree+0x25/0xb0
[  868.588193]  do_new_mount+0x176/0x310
[  868.588782]  __x64_sys_mount+0x103/0x140
[  868.589388]  do_syscall_64+0x3b/0x90
[  868.589935]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  868.590699] RIP: 0033:0x7f2b371c6c4e
[  868.591239] Code: 48 8b 0d dd 71 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
                     0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00
                     00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 71
                     0e 00 f7 d8 64 89 01 48
[  868.593810] RSP: 002b:00007ffc83775d88 EFLAGS: 00000246
               ORIG_RAX: 00000000000000a5
[  868.594691] RAX: ffffffffffffffda RBX: 00007ffc83775f10 RCX: 00007f2b371c6c4e
[  868.595504] RDX: 0000555d517247a0 RSI: 0000555d51724700 RDI: 0000555d51724540
[  868.596317] RBP: 00007ffc83775f10 R08: 0000555d51726890 R09: 0000555d51726890
[  868.597162] R10: 0000000000000000 R11: 0000000000000246 R12: 0000555d51726890
[  868.598005] R13: 0000000000000003 R14: 0000555d517246e0 R15: 0000555d511ac925
[  868.598826]  </TASK>

Cc: stable@vger.kernel.org
Fixes: 69c4a42d72 ("lsm,selinux: add new hook to compare new mount to an existing mount")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[PM: cleanup/line-wrap the backtrace]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-16 17:47:39 -05:00
Ondrej Mosnacek
52f982f00b security,selinux: remove security_add_mnt_opt()
Its last user has been removed in commit f2aedb713c ("NFS: Add
fs_context support.").

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-12-06 13:46:24 -05:00
Paul Moore
6326948f94 lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()
The security_task_getsecid_subj() LSM hook invites misuse by allowing
callers to specify a task even though the hook is only safe when the
current task is referenced.  Fix this by removing the task_struct
argument to the hook, requiring LSM implementations to use the
current task.  While we are changing the hook declaration we also
rename the function to security_current_getsecid_subj() in an effort
to reinforce that the hook captures the subjective credentials of the
current task and not an arbitrary task on the system.

Reviewed-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-11-22 17:52:47 -05:00
Paul Moore
32a370abf1 net,lsm,selinux: revert the security_sctp_assoc_established() hook
This patch reverts two prior patches, e7310c9402
("security: implement sctp_assoc_established hook in selinux") and
7c2ef0240e ("security: add sctp_assoc_established hook"), which
create the security_sctp_assoc_established() LSM hook and provide a
SELinux implementation.  Unfortunately these two patches were merged
without proper review (the Reviewed-by and Tested-by tags from
Richard Haines were for previous revisions of these patches that
were significantly different) and there are outstanding objections
from the SELinux maintainers regarding these patches.

Work is currently ongoing to correct the problems identified in the
reverted patches, as well as others that have come up during review,
but it is unclear at this point in time when that work will be ready
for inclusion in the mainline kernel.  In the interest of not keeping
objectionable code in the kernel for multiple weeks, and potentially
a kernel release, we are reverting the two problematic patches.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-11-12 12:07:02 -05:00
Xin Long
e7310c9402 security: implement sctp_assoc_established hook in selinux
Different from selinux_inet_conn_established(), it also gives the
secid to asoc->peer_secid in selinux_sctp_assoc_established(),
as one UDP-type socket may have more than one asocs.

Note that peer_secid in asoc will save the peer secid for this
asoc connection, and peer_sid in sksec will just keep the peer
secid for the latest connection. So the right use should be do
peeloff for UDP-type socket if there will be multiple asocs in
one socket, so that the peeloff socket has the right label for
its asoc.

v1->v2:
  - call selinux_inet_conn_established() to reduce some code
    duplication in selinux_sctp_assoc_established(), as Ondrej
    suggested.
  - when doing peeloff, it calls sock_create() where it actually
    gets secid for socket from socket_sockcreate_sid(). So reuse
    SECSID_WILD to ensure the peeloff socket keeps using that
    secid after calling selinux_sctp_sk_clone() for client side.

Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Reviewed-by: Richard Haines <richard_c_haines@btinternet.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 11:09:20 +00:00
Xin Long
c081d53f97 security: pass asoc to sctp_assoc_request and sctp_sk_clone
This patch is to move secid and peer_secid from endpoint to association,
and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As
ep is the local endpoint and asoc represents a connection, and in SCTP
one sk/ep could have multiple asoc/connection, saving secid/peer_secid
for new asoc will overwrite the old asoc's.

Note that since asoc can be passed as NULL, security_sctp_assoc_request()
is moved to the place right after the new_asoc is created in
sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init().

v1->v2:
  - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed.
  - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed.

Fixes: 72e89f5008 ("security: Add support for SCTP security hooks")
Reported-by: Prashanth Prahlad <pprahlad@redhat.com>
Reviewed-by: Richard Haines <richard_c_haines@btinternet.com>
Tested-by: Richard Haines <richard_c_haines@btinternet.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03 11:09:20 +00:00
Linus Torvalds
cdab10bf32 selinux/stable-5.16 PR 20211101
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGANbAUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNaMBAAg+9gZr0F7xiafu8JFZqZfx/AQdJ2
 G2cn3le+/tXGZmF8m/+82lOaR6LeQLatgSDJNSkXWkKr0nRwseQJDbtRfvYJdn0t
 Ax05/Fmz6OGxQ2wgRYgaFiSrKpE5p3NhDtiLFVdkCJaQNe/8DZOc7NhBl6EjZf3x
 ubhl2hUiJ4AmiXGwcYhr4uKgP4nhW8OM1/OkskVi+bBMmLA8KTY9kslmIDP5E3BW
 29W4qhqeLNQupY5dGMEMVcyxY9ZUWpO39q4uOaQVZrUGE7xABkj/jhnxT5gFTSlI
 pu8VhsYXm9KuRVveIsv0L5SZfadwoM9YAl7ki1wD3W5rHqOAte3rBTm6VmNlQwfU
 MqxP65Jiyxudxet5Be3/dCRH/+MDQuwBxivgmZXbeVxor2SeznVb0GDaEUC5FSHu
 CJIgWtQzsPJMxgAEGXN4F3QGP0htTTJni56GUPOsrf4TIBW02TT+oLTLFRIokQQL
 INNOfwVSRXElnCsvxsHR4oB+JZ9pJyBaAmeupcQ6jmcKiWlbLj4s+W0U0pM5h91v
 hmMpz7KMxrX6gVL4gB2Jj4aN3r5YRbq26NBu6D+wdwwBTeTTocaHSpAqkv4buClf
 uNk3cG8Hkp8TTg9cM8jYgpxMyzKH/AI/Uw3VhEa1xCiq2Ck3DgfnZvnvcRRaZevU
 FPgmwgqePJXGi60=
 =sb8J
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:

 - Add LSM/SELinux/Smack controls and auditing for io-uring.

   As usual, the individual commit descriptions have more detail, but we
   were basically missing two things which we're adding here:

      + establishment of a proper audit context so that auditing of
        io-uring ops works similarly to how it does for syscalls (with
        some io-uring additions because io-uring ops are *not* syscalls)

      + additional LSM hooks to enable access control points for some of
        the more unusual io-uring features, e.g. credential overrides.

   The additional audit callouts and LSM hooks were done in conjunction
   with the io-uring folks, based on conversations and RFC patches
   earlier in the year.

 - Fixup the binder credential handling so that the proper credentials
   are used in the LSM hooks; the commit description and the code
   comment which is removed in these patches are helpful to understand
   the background and why this is the proper fix.

 - Enable SELinux genfscon policy support for securityfs, allowing
   improved SELinux filesystem labeling for other subsystems which make
   use of securityfs, e.g. IMA.

* tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  security: Return xattr name from security_dentry_init_security()
  selinux: fix a sock regression in selinux_ip_postroute_compat()
  binder: use cred instead of task for getsecid
  binder: use cred instead of task for selinux checks
  binder: use euid from cred instead of using task
  LSM: Avoid warnings about potentially unused hook variables
  selinux: fix all of the W=1 build warnings
  selinux: make better use of the nf_hook_state passed to the NF hooks
  selinux: fix race condition when computing ocontext SIDs
  selinux: remove unneeded ipv6 hook wrappers
  selinux: remove the SELinux lockdown implementation
  selinux: enable genfscon labeling for securityfs
  Smack: Brutalist io_uring support
  selinux: add support for the io_uring access controls
  lsm,io_uring: add LSM hooks to io_uring
  io_uring: convert io_uring to the secure anon inode interface
  fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure()
  audit: add filtering for io_uring records
  audit,io_uring,io-wq: add some basic audit support to io_uring
  audit: prepare audit_context for use in calling contexts beyond syscalls
2021-11-01 21:06:18 -07:00
Vivek Goyal
15bf32398a security: Return xattr name from security_dentry_init_security()
Right now security_dentry_init_security() only supports single security
label and is used by SELinux only. There are two users of this hook,
namely ceph and nfs.

NFS does not care about xattr name. Ceph hardcodes the xattr name to
security.selinux (XATTR_NAME_SELINUX).

I am making changes to fuse/virtiofs to send security label to virtiofsd
and I need to send xattr name as well. I also hardcoded the name of
xattr to security.selinux.

Stephen Smalley suggested that it probably is a good idea to modify
security_dentry_init_security() to also return name of xattr so that
we can avoid this hardcoding in the callers.

This patch adds a new parameter "const char **xattr_name" to
security_dentry_init_security() and LSM puts the name of xattr
too if caller asked for it (xattr_name != NULL).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
[PM: fixed typos in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-20 08:17:08 -04:00
Paul Moore
1c73213ba9 selinux: fix a sock regression in selinux_ip_postroute_compat()
Unfortunately we can't rely on nf_hook_state->sk being the proper
originating socket so revert to using skb_to_full_sk(skb).

Fixes: 1d1e1ded13 ("selinux: make better use of the nf_hook_state passed to the NF hooks")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-19 12:35:18 -04:00
Todd Kjos
52f8869337 binder: use cred instead of task for selinux checks
Since binder was integrated with selinux, it has passed
'struct task_struct' associated with the binder_proc
to represent the source and target of transactions.
The conversion of task to SID was then done in the hook
implementations. It turns out that there are race conditions
which can result in an incorrect security context being used.

Fix by using the 'struct cred' saved during binder_open and pass
it to the selinux subsystem.

Cc: stable@vger.kernel.org # 5.14 (need backport for earlier stables)
Fixes: 79af73079d ("Add security hooks to binder and implement the hooks for SELinux.")
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-14 20:48:04 -04:00
Paul Moore
1d1e1ded13 selinux: make better use of the nf_hook_state passed to the NF hooks
This patch builds on a previous SELinux/netfilter patch by Florian
Westphal and makes better use of the nf_hook_state variable passed
into the SELinux/netfilter hooks as well as a number of other small
cleanups in the related code.

Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-13 16:31:18 -04:00
Florian Westphal
4342f70538 selinux: remove unneeded ipv6 hook wrappers
Netfilter places the protocol number the hook function is getting called
from in state->pf, so we can use that instead of an extra wrapper.

While at it, remove one-line wrappers too and make
selinux_ip_{out,forward,postroute} useable as hook function.

Signed-off-by: Florian Westphal <fw@strlen.de>
Message-Id: <20211011202229.28289-1-fw@strlen.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-10-11 17:44:00 -04:00
Paul Moore
f5d0e5e9d7 selinux: remove the SELinux lockdown implementation
NOTE: This patch intentionally omits any "Fixes:" metadata or stable
tagging since it removes a SELinux access control check; while
removing the control point is the right thing to do moving forward,
removing it in stable kernels could be seen as a regression.

The original SELinux lockdown implementation in 59438b4647
("security,lockdown,selinux: implement SELinux lockdown") used the
current task's credentials as both the subject and object in the
SELinux lockdown hook, selinux_lockdown().  Unfortunately that
proved to be incorrect in a number of cases as the core kernel was
calling the LSM lockdown hook in places where the credentials from
the "current" task_struct were not the correct credentials to use
in the SELinux access check.

Attempts were made to resolve this by adding a credential pointer
to the LSM lockdown hook as well as suggesting that the single hook
be split into two: one for user tasks, one for kernel tasks; however
neither approach was deemed acceptable by Linus.  Faced with the
prospect of either changing the subj/obj in the access check to a
constant context (likely the kernel's label) or removing the SELinux
lockdown check entirely, the SELinux community decided that removing
the lockdown check was preferable.

The supporting changes to the general LSM layer are left intact, this
patch only removes the SELinux implementation.

Acked-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-30 10:12:33 -04:00
Christian Göttsche
8a764ef1bd selinux: enable genfscon labeling for securityfs
Add support for genfscon per-file labeling of securityfs files.
This allows for separate labels and thereby access control for
different files. For example a genfscon statement

    genfscon securityfs /integrity/ima/policy \
	system_u:object_r:ima_policy_t:s0

will set a private label to the IMA policy file and thus allow to
control the ability to set the IMA policy. Setting labels directly
with setxattr(2), e.g. by chcon(1) or setfiles(8), is still not
supported.

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: line width fixes in the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-28 18:49:03 -04:00
Paul Moore
a3727a8bac selinux,smack: fix subjective/objective credential use mixups
Jann Horn reported a problem with commit eb1231f73c ("selinux:
clarify task subjective and objective credentials") where some LSM
hooks were attempting to access the subjective credentials of a task
other than the current task.  Generally speaking, it is not safe to
access another task's subjective credentials and doing so can cause
a number of problems.

Further, while looking into the problem, I realized that Smack was
suffering from a similar problem brought about by a similar commit
1fb057dcde ("smack: differentiate between subjective and objective
task credentials").

This patch addresses this problem by restoring the use of the task's
objective credentials in those cases where the task is other than the
current executing task.  Not only does this resolve the problem
reported by Jann, it is arguably the correct thing to do in these
cases.

Cc: stable@vger.kernel.org
Fixes: eb1231f73c ("selinux: clarify task subjective and objective credentials")
Fixes: 1fb057dcde ("smack: differentiate between subjective and objective task credentials")
Reported-by: Jann Horn <jannh@google.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-23 12:30:59 -04:00
Paul Moore
740b03414b selinux: add support for the io_uring access controls
This patch implements two new io_uring access controls, specifically
support for controlling the io_uring "personalities" and
IORING_SETUP_SQPOLL.  Controlling the sharing of io_urings themselves
is handled via the normal file/inode labeling and sharing mechanisms.

The io_uring { override_creds } permission restricts which domains
the subject domain can use to override it's own credentials.
Granting a domain the io_uring { override_creds } permission allows
it to impersonate another domain in io_uring operations.

The io_uring { sqpoll } permission restricts which domains can create
asynchronous io_uring polling threads.  This is important from a
security perspective as operations queued by this asynchronous thread
inherit the credentials of the thread creator by default; if an
io_uring is shared across process/domain boundaries this could result
in one domain impersonating another.  Controlling the creation of
sqpoll threads, and the sharing of io_urings across processes, allow
policy authors to restrict the ability of one domain to impersonate
another via io_uring.

As a quick summary, this patch adds a new object class with two
permissions:

 io_uring { override_creds sqpoll }

These permissions can be seen in the two simple policy statements
below:

  allow domA_t domB_t : io_uring { override_creds };
  allow domA_t self : io_uring { sqpoll };

Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-09-19 22:40:32 -04:00
Linus Torvalds
9e9fb7655e Core:
- Enable memcg accounting for various networking objects.
 
 BPF:
 
  - Introduce bpf timers.
 
  - Add perf link and opaque bpf_cookie which the program can read
    out again, to be used in libbpf-based USDT library.
 
  - Add bpf_task_pt_regs() helper to access user space pt_regs
    in kprobes, to help user space stack unwinding.
 
  - Add support for UNIX sockets for BPF sockmap.
 
  - Extend BPF iterator support for UNIX domain sockets.
 
  - Allow BPF TCP congestion control progs and bpf iterators to call
    bpf_setsockopt(), e.g. to switch to another congestion control
    algorithm.
 
 Protocols:
 
  - Support IOAM Pre-allocated Trace with IPv6.
 
  - Support Management Component Transport Protocol.
 
  - bridge: multicast: add vlan support.
 
  - netfilter: add hooks for the SRv6 lightweight tunnel driver.
 
  - tcp:
     - enable mid-stream window clamping (by user space or BPF)
     - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
     - more accurate DSACK processing for RACK-TLP
 
  - mptcp:
     - add full mesh path manager option
     - add partial support for MP_FAIL
     - improve use of backup subflows
     - optimize option processing
 
  - af_unix: add OOB notification support.
 
  - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by
          the router.
 
  - mac80211: Target Wake Time support in AP mode.
 
  - can: j1939: extend UAPI to notify about RX status.
 
 Driver APIs:
 
  - Add page frag support in page pool API.
 
  - Many improvements to the DSA (distributed switch) APIs.
 
  - ethtool: extend IRQ coalesce uAPI with timer reset modes.
 
  - devlink: control which auxiliary devices are created.
 
  - Support CAN PHYs via the generic PHY subsystem.
 
  - Proper cross-chip support for tag_8021q.
 
  - Allow TX forwarding for the software bridge data path to be
    offloaded to capable devices.
 
 Drivers:
 
  - veth: more flexible channels number configuration.
 
  - openvswitch: introduce per-cpu upcall dispatch.
 
  - Add internet mix (IMIX) mode to pktgen.
 
  - Transparently handle XDP operations in the bonding driver.
 
  - Add LiteETH network driver.
 
  - Renesas (ravb):
    - support Gigabit Ethernet IP
 
  - NXP Ethernet switch (sja1105)
    - fast aging support
    - support for "H" switch topologies
    - traffic termination for ports under VLAN-aware bridge
 
  - Intel 1G Ethernet
     - support getcrosststamp() with PCIe PTM (Precision Time
       Measurement) for better time sync
     - support Credit-Based Shaper (CBS) offload, enabling HW traffic
       prioritization and bandwidth reservation
 
  - Broadcom Ethernet (bnxt)
     - support pulse-per-second output
     - support larger Rx rings
 
  - Mellanox Ethernet (mlx5)
     - support ethtool RSS contexts and MQPRIO channel mode
     - support LAG offload with bridging
     - support devlink rate limit API
     - support packet sampling on tunnels
 
  - Huawei Ethernet (hns3):
     - basic devlink support
     - add extended IRQ coalescing support
     - report extended link state
 
  - Netronome Ethernet (nfp):
     - add conntrack offload support
 
  - Broadcom WiFi (brcmfmac):
     - add WPA3 Personal with FT to supported cipher suites
     - support 43752 SDIO device
 
  - Intel WiFi (iwlwifi):
     - support scanning hidden 6GHz networks
     - support for a new hardware family (Bz)
 
  - Xen pv driver:
     - harden netfront against malicious backends
 
  - Qualcomm mobile
     - ipa: refactor power management and enable automatic suspend
     - mhi: move MBIM to WWAN subsystem interfaces
 
 Refactor:
 
  - Ambient BPF run context and cgroup storage cleanup.
 
  - Compat rework for ndo_ioctl.
 
 Old code removal:
 
  - prism54 remove the obsoleted driver, deprecated by the p54 driver.
 
  - wan: remove sbni/granch driver.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEukBYACgkQMUZtbf5S
 IrsyHA//TO8dw18NYts4n9LmlJT2naJ7yBUUSSXK/M+DtW0MQ9nnHhqzPm5uJdRl
 IgQTNJrW3dYzRwgqaWZqEwO1t5/FI+f87ND1Nsekg7x9tF66a6ov5WxU26TwwSba
 U+si/inQ/4chuQ+LxMQobqCDxaLE46I2dIoRl+YfndJ24DRzYSwAEYIPPbSdfyU+
 +/l+3s4GaxO4k/hLciPAiOniyxLoUNiGUTNh+2yqRBXelSRJRKVnl+V22ANFrxRW
 nTEiplfVKhlPU1e4iLuRtaxDDiePHhw9I3j/lMHhfeFU2P/gKJIvz4QpGV0CAZg2
 1VvDU32WEx1GQLXJbKm0KwoNRUq1QSjOyyFti+BO7ugGaYAR4gKhShOqlSYLzUtB
 tbtzQhSNLWOGqgmSJOztZb5kFDm2EdRSll5/lP2uyFlPkIsIp0QbscJVzNTnS74b
 Xz15ZOw41Z4TfWPEMWgfrx6Zkm7pPWkly+7WfUkPcHa1gftNz6tzXXxSXcXIBPdi
 yQ5JCzzxrM5573YHuk5YedwZpn6PiAt4A/muFGk9C6aXP60TQAOS/ppaUzZdnk4D
 NfOk9mj06WEULjYjPcKEuT3GGWE6kmjb8Pu0QZWKOchv7vr6oZly1EkVZqYlXELP
 AfhcrFeuufie8mqm0jdb4LnYaAnqyLzlb1J4Zxh9F+/IX7G3yoc=
 =JDGD
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Enable memcg accounting for various networking objects.

  BPF:

   - Introduce bpf timers.

   - Add perf link and opaque bpf_cookie which the program can read out
     again, to be used in libbpf-based USDT library.

   - Add bpf_task_pt_regs() helper to access user space pt_regs in
     kprobes, to help user space stack unwinding.

   - Add support for UNIX sockets for BPF sockmap.

   - Extend BPF iterator support for UNIX domain sockets.

   - Allow BPF TCP congestion control progs and bpf iterators to call
     bpf_setsockopt(), e.g. to switch to another congestion control
     algorithm.

  Protocols:

   - Support IOAM Pre-allocated Trace with IPv6.

   - Support Management Component Transport Protocol.

   - bridge: multicast: add vlan support.

   - netfilter: add hooks for the SRv6 lightweight tunnel driver.

   - tcp:
       - enable mid-stream window clamping (by user space or BPF)
       - allow data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD
       - more accurate DSACK processing for RACK-TLP

   - mptcp:
       - add full mesh path manager option
       - add partial support for MP_FAIL
       - improve use of backup subflows
       - optimize option processing

   - af_unix: add OOB notification support.

   - ipv6: add IFLA_INET6_RA_MTU to expose MTU value advertised by the
     router.

   - mac80211: Target Wake Time support in AP mode.

   - can: j1939: extend UAPI to notify about RX status.

  Driver APIs:

   - Add page frag support in page pool API.

   - Many improvements to the DSA (distributed switch) APIs.

   - ethtool: extend IRQ coalesce uAPI with timer reset modes.

   - devlink: control which auxiliary devices are created.

   - Support CAN PHYs via the generic PHY subsystem.

   - Proper cross-chip support for tag_8021q.

   - Allow TX forwarding for the software bridge data path to be
     offloaded to capable devices.

  Drivers:

   - veth: more flexible channels number configuration.

   - openvswitch: introduce per-cpu upcall dispatch.

   - Add internet mix (IMIX) mode to pktgen.

   - Transparently handle XDP operations in the bonding driver.

   - Add LiteETH network driver.

   - Renesas (ravb):
       - support Gigabit Ethernet IP

   - NXP Ethernet switch (sja1105):
       - fast aging support
       - support for "H" switch topologies
       - traffic termination for ports under VLAN-aware bridge

   - Intel 1G Ethernet
       - support getcrosststamp() with PCIe PTM (Precision Time
         Measurement) for better time sync
       - support Credit-Based Shaper (CBS) offload, enabling HW traffic
         prioritization and bandwidth reservation

   - Broadcom Ethernet (bnxt)
       - support pulse-per-second output
       - support larger Rx rings

   - Mellanox Ethernet (mlx5)
       - support ethtool RSS contexts and MQPRIO channel mode
       - support LAG offload with bridging
       - support devlink rate limit API
       - support packet sampling on tunnels

   - Huawei Ethernet (hns3):
       - basic devlink support
       - add extended IRQ coalescing support
       - report extended link state

   - Netronome Ethernet (nfp):
       - add conntrack offload support

   - Broadcom WiFi (brcmfmac):
       - add WPA3 Personal with FT to supported cipher suites
       - support 43752 SDIO device

   - Intel WiFi (iwlwifi):
       - support scanning hidden 6GHz networks
       - support for a new hardware family (Bz)

   - Xen pv driver:
       - harden netfront against malicious backends

   - Qualcomm mobile
       - ipa: refactor power management and enable automatic suspend
       - mhi: move MBIM to WWAN subsystem interfaces

  Refactor:

   - Ambient BPF run context and cgroup storage cleanup.

   - Compat rework for ndo_ioctl.

  Old code removal:

   - prism54 remove the obsoleted driver, deprecated by the p54 driver.

   - wan: remove sbni/granch driver"

* tag 'net-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1715 commits)
  net: Add depends on OF_NET for LiteX's LiteETH
  ipv6: seg6: remove duplicated include
  net: hns3: remove unnecessary spaces
  net: hns3: add some required spaces
  net: hns3: clean up a type mismatch warning
  net: hns3: refine function hns3_set_default_feature()
  ipv6: remove duplicated 'net/lwtunnel.h' include
  net: w5100: check return value after calling platform_get_resource()
  net/mlxbf_gige: Make use of devm_platform_ioremap_resourcexxx()
  net: mdio: mscc-miim: Make use of the helper function devm_platform_ioremap_resource()
  net: mdio-ipq4019: Make use of devm_platform_ioremap_resource()
  fou: remove sparse errors
  ipv4: fix endianness issue in inet_rtm_getroute_build_skb()
  octeontx2-af: Set proper errorcode for IPv4 checksum errors
  octeontx2-af: Fix static code analyzer reported issues
  octeontx2-af: Fix mailbox errors in nix_rss_flowkey_cfg
  octeontx2-af: Fix loop in free and unmap counter
  af_unix: fix potential NULL deref in unix_dgram_connect()
  dpaa2-eth: Replace strlcpy with strscpy
  octeontx2-af: Use NDC TX for transmit packet data
  ...
2021-08-31 16:43:06 -07:00
Jeremy Kerr
bc49d8169a mctp: Add MCTP base
Add basic Kconfig, an initial (empty) af_mctp source object, and
{AF,PF}_MCTP definitions, and the required definitions for a new
protocol type.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-29 15:06:49 +01:00
Austin Kim
893c47d196 selinux: return early for possible NULL audit buffers
audit_log_start() may return NULL in below cases:

  - when audit is not initialized.
  - when audit backlog limit exceeds.

After the call to audit_log_start() is made and then possible NULL audit
buffer argument is passed to audit_log_*() functions,
audit_log_*() functions return immediately in case of a NULL audit buffer
argument.

But it is optimal to return early when audit_log_start() returns NULL,
because it is not necessary for audit_log_*() functions to be called with
NULL audit buffer argument.

So add exception handling for possible NULL audit buffers where
return value can be handled from callers.

Signed-off-by: Austin Kim <austin.kim@lge.com>
[PM: tweak subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-07-14 15:25:27 -04:00
Al Viro
d99cf13f14 selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
... along with avc_has_perm_flags() itself, since now it's identical
to avc_has_perm() (as pointed out by Paul Moore)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[PM: add "selinux:" prefix to subj and tweak for length]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-06-11 13:11:45 -04:00
Al Viro
b17ec22fb3 selinux: slow_avc_audit has become non-blocking
dump_common_audit_data() is safe to use under rcu_read_lock() now;
no need for AVC_NONBLOCKING and games around it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-06-11 13:05:18 -04:00
Ondrej Mosnacek
869cbeef18 lsm_audit,selinux: pass IB device name by reference
While trying to address a Coverity warning that the dev_name string
might end up unterminated when strcpy'ing it in
selinux_ib_endport_manage_subnet(), I realized that it is possible (and
simpler) to just pass the dev_name pointer directly, rather than copying
the string to a buffer.

The ibendport variable goes out of scope at the end of the function
anyway, so the lifetime of the dev_name pointer will never be shorter
than that of ibendport, thus we can safely just pass the dev_name
pointer and be done with it.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-05-14 16:38:19 -04:00
Linus Torvalds
17ae69aba8 Add Landlock, a new LSM from Mickaël Salaün <mic@linux.microsoft.com>
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEgycj0O+d1G2aycA8rZhLv9lQBTwFAmCInP4ACgkQrZhLv9lQ
 BTza0g//dTeb9woC9H7qlEhK4l9yk62lTss60Q8X7m7ZSNfdL4tiEbi64SgK+iOW
 OOegbrOEb8Kzh4KJJYmVlVZ5YUWyH4szgmee1wnylBdsWiWaPLPF3Cflz77apy6T
 TiiBsJd7rRE29FKheaMt34B41BMh8QHESN+DzjzJWsFoi/uNxjgSs2W16XuSupKu
 bpRmB1pYNXMlrkzz7taL05jndZYE5arVriqlxgAsuLOFOp/ER7zecrjImdCM/4kL
 W6ej0R1fz2Geh6CsLBJVE+bKWSQ82q5a4xZEkSYuQHXgZV5eywE5UKu8ssQcRgQA
 VmGUY5k73rfY9Ofupf2gCaf/JSJNXKO/8Xjg0zAdklKtmgFjtna5Tyg9I90j7zn+
 5swSpKuRpilN8MQH+6GWAnfqQlNoviTOpFeq3LwBtNVVOh08cOg6lko/bmebBC+R
 TeQPACKS0Q0gCDPm9RYoU1pMUuYgfOwVfVRZK1prgi2Co7ZBUMOvYbNoKYoPIydr
 ENBYljlU1OYwbzgR2nE+24fvhU8xdNOVG1xXYPAEHShu+p7dLIWRLhl8UCtRQpSR
 1ofeVaJjgjrp29O+1OIQjB2kwCaRdfv/Gq1mztE/VlMU/r++E62OEzcH0aS+mnrg
 yzfyUdI8IFv1q6FGT9yNSifWUWxQPmOKuC8kXsKYfqfJsFwKmHM=
 =uCN4
 -----END PGP SIGNATURE-----

Merge tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull Landlock LSM from James Morris:
 "Add Landlock, a new LSM from Mickaël Salaün.

  Briefly, Landlock provides for unprivileged application sandboxing.

  From Mickaël's cover letter:
    "The goal of Landlock is to enable to restrict ambient rights (e.g.
     global filesystem access) for a set of processes. Because Landlock
     is a stackable LSM [1], it makes possible to create safe security
     sandboxes as new security layers in addition to the existing
     system-wide access-controls. This kind of sandbox is expected to
     help mitigate the security impact of bugs or unexpected/malicious
     behaviors in user-space applications. Landlock empowers any
     process, including unprivileged ones, to securely restrict
     themselves.

     Landlock is inspired by seccomp-bpf but instead of filtering
     syscalls and their raw arguments, a Landlock rule can restrict the
     use of kernel objects like file hierarchies, according to the
     kernel semantic. Landlock also takes inspiration from other OS
     sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD
     Pledge/Unveil.

     In this current form, Landlock misses some access-control features.
     This enables to minimize this patch series and ease review. This
     series still addresses multiple use cases, especially with the
     combined use of seccomp-bpf: applications with built-in sandboxing,
     init systems, security sandbox tools and security-oriented APIs [2]"

  The cover letter and v34 posting is here:

      https://lore.kernel.org/linux-security-module/20210422154123.13086-1-mic@digikod.net/

  See also:

      https://landlock.io/

  This code has had extensive design discussion and review over several
  years"

Link: https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-ca.com/ [1]
Link: https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.net/ [2]

* tag 'landlock_v34' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  landlock: Enable user space to infer supported features
  landlock: Add user and kernel documentation
  samples/landlock: Add a sandbox manager example
  selftests/landlock: Add user space tests
  landlock: Add syscall implementations
  arch: Wire up Landlock syscalls
  fs,security: Add sb_delete hook
  landlock: Support filesystem access-control
  LSM: Infrastructure management of the superblock
  landlock: Add ptrace restrictions
  landlock: Set up the security framework and manage credentials
  landlock: Add ruleset and domain management
  landlock: Add object management
2021-05-01 18:50:44 -07:00
Casey Schaufler
1aea780837 LSM: Infrastructure management of the superblock
Move management of the superblock->sb_security blob out of the
individual security modules and into the security infrastructure.
Instead of allocating the blobs from within the modules, the modules
tell the infrastructure how much space is required, and the space is
allocated there.

Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
2021-04-22 12:22:10 -07:00
Paul Moore
eb1231f73c selinux: clarify task subjective and objective credentials
SELinux has a function, task_sid(), which returns the task's
objective credentials, but unfortunately is used in a few places
where the subjective task credentials should be used.  Most notably
in the new security_task_getsecid_subj() LSM hook.

This patch fixes this and attempts to make things more obvious by
introducing a new function, task_sid_subj(), and renaming the
existing task_sid() function to task_sid_obj().

This patch also adds an interesting function in task_sid_binder().
The task_sid_binder() function has a comment which hopefully
describes it's reason for being, but it basically boils down to the
simple fact that we can't safely access another task's subjective
credentials so in the case of binder we need to stick with the
objective credentials regardless.

Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 15:24:01 -04:00
Paul Moore
4ebd7651bf lsm: separate security_task_getsecid() into subjective and objective variants
Of the three LSMs that implement the security_task_getsecid() LSM
hook, all three LSMs provide the task's objective security
credentials.  This turns out to be unfortunate as most of the hook's
callers seem to expect the task's subjective credentials, although
a small handful of callers do correctly expect the objective
credentials.

This patch is the first step towards fixing the problem: it splits
the existing security_task_getsecid() hook into two variants, one
for the subjective creds, one for the objective creds.

  void security_task_getsecid_subj(struct task_struct *p,
				   u32 *secid);
  void security_task_getsecid_obj(struct task_struct *p,
				  u32 *secid);

While this patch does fix all of the callers to use the correct
variant, in order to keep this patch focused on the callers and to
ease review, the LSMs continue to use the same implementation for
both hooks.  The net effect is that this patch should not change
the behavior of the kernel in any way, it will be up to the latter
LSM specific patches in this series to change the hook
implementations and return the correct credentials.

Acked-by: Mimi Zohar <zohar@linux.ibm.com> (IMA)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 15:23:32 -04:00
Olga Kornievskaia
69c4a42d72 lsm,selinux: add new hook to compare new mount to an existing mount
Add a new hook that takes an existing super block and a new mount
with new options and determines if new options confict with an
existing mount or not.

A filesystem can use this new hook to determine if it can share
the an existing superblock with a new superblock for the new mount.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[PM: tweak the subject line, fix tab/space problems]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-22 14:53:37 -04:00
Vivek Goyal
7fa2e79a6b selinux: Allow context mounts for unpriviliged overlayfs
Now overlayfs allow unpriviliged mounts. That is root inside a non-init
user namespace can mount overlayfs. This is being added in 5.11 kernel.

Giuseppe tried to mount overlayfs with option "context" and it failed
with error -EACCESS.

$ su test
$ unshare -rm
$ mkdir -p lower upper work merged
$ mount -t overlay -o lowerdir=lower,workdir=work,upperdir=upper,userxattr,context='system_u:object_r:container_file_t:s0' none merged

This fails with -EACCESS. It works if option "-o context" is not specified.

Little debugging showed that selinux_set_mnt_opts() returns -EACCESS.

So this patch adds "overlay" to the list, where it is fine to specific
context from non init_user_ns.

Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
[PM: trimmed the changelog from the description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-03-08 19:34:38 -05:00
Linus Torvalds
7d6beb71da idmapped-mounts-v5.12
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
 ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
 =yPaw
 -----END PGP SIGNATURE-----

Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull idmapped mounts from Christian Brauner:
 "This introduces idmapped mounts which has been in the making for some
  time. Simply put, different mounts can expose the same file or
  directory with different ownership. This initial implementation comes
  with ports for fat, ext4 and with Christoph's port for xfs with more
  filesystems being actively worked on by independent people and
  maintainers.

  Idmapping mounts handle a wide range of long standing use-cases. Here
  are just a few:

   - Idmapped mounts make it possible to easily share files between
     multiple users or multiple machines especially in complex
     scenarios. For example, idmapped mounts will be used in the
     implementation of portable home directories in
     systemd-homed.service(8) where they allow users to move their home
     directory to an external storage device and use it on multiple
     computers where they are assigned different uids and gids. This
     effectively makes it possible to assign random uids and gids at
     login time.

   - It is possible to share files from the host with unprivileged
     containers without having to change ownership permanently through
     chown(2).

   - It is possible to idmap a container's rootfs and without having to
     mangle every file. For example, Chromebooks use it to share the
     user's Download folder with their unprivileged containers in their
     Linux subsystem.

   - It is possible to share files between containers with
     non-overlapping idmappings.

   - Filesystem that lack a proper concept of ownership such as fat can
     use idmapped mounts to implement discretionary access (DAC)
     permission checking.

   - They allow users to efficiently changing ownership on a per-mount
     basis without having to (recursively) chown(2) all files. In
     contrast to chown (2) changing ownership of large sets of files is
     instantenous with idmapped mounts. This is especially useful when
     ownership of a whole root filesystem of a virtual machine or
     container is changed. With idmapped mounts a single syscall
     mount_setattr syscall will be sufficient to change the ownership of
     all files.

   - Idmapped mounts always take the current ownership into account as
     idmappings specify what a given uid or gid is supposed to be mapped
     to. This contrasts with the chown(2) syscall which cannot by itself
     take the current ownership of the files it changes into account. It
     simply changes the ownership to the specified uid and gid. This is
     especially problematic when recursively chown(2)ing a large set of
     files which is commong with the aforementioned portable home
     directory and container and vm scenario.

   - Idmapped mounts allow to change ownership locally, restricting it
     to specific mounts, and temporarily as the ownership changes only
     apply as long as the mount exists.

  Several userspace projects have either already put up patches and
  pull-requests for this feature or will do so should you decide to pull
  this:

   - systemd: In a wide variety of scenarios but especially right away
     in their implementation of portable home directories.

         https://systemd.io/HOME_DIRECTORY/

   - container runtimes: containerd, runC, LXD:To share data between
     host and unprivileged containers, unprivileged and privileged
     containers, etc. The pull request for idmapped mounts support in
     containerd, the default Kubernetes runtime is already up for quite
     a while now: https://github.com/containerd/containerd/pull/4734

   - The virtio-fs developers and several users have expressed interest
     in using this feature with virtual machines once virtio-fs is
     ported.

   - ChromeOS: Sharing host-directories with unprivileged containers.

  I've tightly synced with all those projects and all of those listed
  here have also expressed their need/desire for this feature on the
  mailing list. For more info on how people use this there's a bunch of
  talks about this too. Here's just two recent ones:

      https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
      https://fosdem.org/2021/schedule/event/containers_idmap/

  This comes with an extensive xfstests suite covering both ext4 and
  xfs:

      https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts

  It covers truncation, creation, opening, xattrs, vfscaps, setid
  execution, setgid inheritance and more both with idmapped and
  non-idmapped mounts. It already helped to discover an unrelated xfs
  setgid inheritance bug which has since been fixed in mainline. It will
  be sent for inclusion with the xfstests project should you decide to
  merge this.

  In order to support per-mount idmappings vfsmounts are marked with
  user namespaces. The idmapping of the user namespace will be used to
  map the ids of vfs objects when they are accessed through that mount.
  By default all vfsmounts are marked with the initial user namespace.
  The initial user namespace is used to indicate that a mount is not
  idmapped. All operations behave as before and this is verified in the
  testsuite.

  Based on prior discussions we want to attach the whole user namespace
  and not just a dedicated idmapping struct. This allows us to reuse all
  the helpers that already exist for dealing with idmappings instead of
  introducing a whole new range of helpers. In addition, if we decide in
  the future that we are confident enough to enable unprivileged users
  to setup idmapped mounts the permission checking can take into account
  whether the caller is privileged in the user namespace the mount is
  currently marked with.

  The user namespace the mount will be marked with can be specified by
  passing a file descriptor refering to the user namespace as an
  argument to the new mount_setattr() syscall together with the new
  MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
  of extensibility.

  The following conditions must be met in order to create an idmapped
  mount:

   - The caller must currently have the CAP_SYS_ADMIN capability in the
     user namespace the underlying filesystem has been mounted in.

   - The underlying filesystem must support idmapped mounts.

   - The mount must not already be idmapped. This also implies that the
     idmapping of a mount cannot be altered once it has been idmapped.

   - The mount must be a detached/anonymous mount, i.e. it must have
     been created by calling open_tree() with the OPEN_TREE_CLONE flag
     and it must not already have been visible in the filesystem.

  The last two points guarantee easier semantics for userspace and the
  kernel and make the implementation significantly simpler.

  By default vfsmounts are marked with the initial user namespace and no
  behavioral or performance changes are observed.

  The manpage with a detailed description can be found here:

      1d7b902e28

  In order to support idmapped mounts, filesystems need to be changed
  and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
  patches to convert individual filesystem are not very large or
  complicated overall as can be seen from the included fat, ext4, and
  xfs ports. Patches for other filesystems are actively worked on and
  will be sent out separately. The xfstestsuite can be used to verify
  that port has been done correctly.

  The mount_setattr() syscall is motivated independent of the idmapped
  mounts patches and it's been around since July 2019. One of the most
  valuable features of the new mount api is the ability to perform
  mounts based on file descriptors only.

  Together with the lookup restrictions available in the openat2()
  RESOLVE_* flag namespace which we added in v5.6 this is the first time
  we are close to hardened and race-free (e.g. symlinks) mounting and
  path resolution.

  While userspace has started porting to the new mount api to mount
  proper filesystems and create new bind-mounts it is currently not
  possible to change mount options of an already existing bind mount in
  the new mount api since the mount_setattr() syscall is missing.

  With the addition of the mount_setattr() syscall we remove this last
  restriction and userspace can now fully port to the new mount api,
  covering every use-case the old mount api could. We also add the
  crucial ability to recursively change mount options for a whole mount
  tree, both removing and adding mount options at the same time. This
  syscall has been requested multiple times by various people and
  projects.

  There is a simple tool available at

      https://github.com/brauner/mount-idmapped

  that allows to create idmapped mounts so people can play with this
  patch series. I'll add support for the regular mount binary should you
  decide to pull this in the following weeks:

  Here's an example to a simple idmapped mount of another user's home
  directory:

	u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt

	u1001@f2-vm:/$ ls -al /home/ubuntu/
	total 28
	drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
	drwxr-xr-x 4 root   root   4096 Oct 28 04:00 ..
	-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
	-rw-r--r-- 1 ubuntu ubuntu  220 Feb 25  2020 .bash_logout
	-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25  2020 .bashrc
	-rw-r--r-- 1 ubuntu ubuntu  807 Feb 25  2020 .profile
	-rw-r--r-- 1 ubuntu ubuntu    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ ls -al /mnt/
	total 28
	drwxr-xr-x  2 u1001 u1001 4096 Oct 28 22:07 .
	drwxr-xr-x 29 root  root  4096 Oct 28 22:01 ..
	-rw-------  1 u1001 u1001 3154 Oct 28 22:12 .bash_history
	-rw-r--r--  1 u1001 u1001  220 Feb 25  2020 .bash_logout
	-rw-r--r--  1 u1001 u1001 3771 Feb 25  2020 .bashrc
	-rw-r--r--  1 u1001 u1001  807 Feb 25  2020 .profile
	-rw-r--r--  1 u1001 u1001    0 Oct 16 16:11 .sudo_as_admin_successful
	-rw-------  1 u1001 u1001 1144 Oct 28 00:43 .viminfo

	u1001@f2-vm:/$ touch /mnt/my-file

	u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file

	u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file

	u1001@f2-vm:/$ ls -al /mnt/my-file
	-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file

	u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
	-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file

	u1001@f2-vm:/$ getfacl /mnt/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: mnt/my-file
	# owner: u1001
	# group: u1001
	user::rw-
	user:u1001:rwx
	group::rw-
	mask::rwx
	other::r--

	u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
	getfacl: Removing leading '/' from absolute path names
	# file: home/ubuntu/my-file
	# owner: ubuntu
	# group: ubuntu
	user::rw-
	user:ubuntu:rwx
	group::rw-
	mask::rwx
	other::r--"

* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
  xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
  xfs: support idmapped mounts
  ext4: support idmapped mounts
  fat: handle idmapped mounts
  tests: add mount_setattr() selftests
  fs: introduce MOUNT_ATTR_IDMAP
  fs: add mount_setattr()
  fs: add attr_flags_to_mnt_flags helper
  fs: split out functions to hold writers
  namespace: only take read lock in do_reconfigure_mnt()
  mount: make {lock,unlock}_mount_hash() static
  namespace: take lock_mount_hash() directly when changing flags
  nfs: do not export idmapped mounts
  overlayfs: do not mount on top of idmapped mounts
  ecryptfs: do not mount on top of idmapped mounts
  ima: handle idmapped mounts
  apparmor: handle idmapped mounts
  fs: make helpers idmap mount aware
  exec: handle idmapped mounts
  would_dump: handle idmapped mounts
  ...
2021-02-23 13:39:45 -08:00
Christian Brauner
71bc356f93
commoncap: handle idmapped mounts
When interacting with user namespace and non-user namespace aware
filesystem capabilities the vfs will perform various security checks to
determine whether or not the filesystem capabilities can be used by the
caller, whether they need to be removed and so on. The main
infrastructure for this resides in the capability codepaths but they are
called through the LSM security infrastructure even though they are not
technically an LSM or optional. This extends the existing security hooks
security_inode_removexattr(), security_inode_killpriv(),
security_inode_getsecurity() to pass down the mount's user namespace and
makes them aware of idmapped mounts.

In order to actually get filesystem capabilities from disk the
capability infrastructure exposes the get_vfs_caps_from_disk() helper.
For user namespace aware filesystem capabilities a root uid is stored
alongside the capabilities.

In order to determine whether the caller can make use of the filesystem
capability or whether it needs to be ignored it is translated according
to the superblock's user namespace. If it can be translated to uid 0
according to that id mapping the caller can use the filesystem
capabilities stored on disk. If we are accessing the inode that holds
the filesystem capabilities through an idmapped mount we map the root
uid according to the mount's user namespace. Afterwards the checks are
identical to non-idmapped mounts: reading filesystem caps from disk
enforces that the root uid associated with the filesystem capability
must have a mapping in the superblock's user namespace and that the
caller is either in the same user namespace or is a descendant of the
superblock's user namespace. For filesystems that are mountable inside
user namespace the caller can just mount the filesystem and won't
usually need to idmap it. If they do want to idmap it they can create an
idmapped mount and mark it with a user namespace they created and which
is thus a descendant of s_user_ns. For filesystems that are not
mountable inside user namespaces the descendant rule is trivially true
because the s_user_ns will be the initial user namespace.

If the initial user namespace is passed nothing changes so non-idmapped
mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-11-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Tycho Andersen
c7c7a1a18a
xattr: handle idmapped mounts
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christian Brauner
21cb47be6f
inode: make init and permission helpers idmapped mount aware
The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Daniel Colascione
29cd6591ab selinux: teach SELinux about anonymous inodes
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patches to give SELinux the ability to control
anonymous-inode files that are created using the new
anon_inode_getfd_secure() function.

A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".

Example:

type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };

(The next patch in this series is necessary for making userfaultfd
support this new interface.  The example above is just
for exposition.)

Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-14 17:38:10 -05:00
Ondrej Mosnacek
08abe46b2c selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support
When a superblock is assigned the SECURITY_FS_USE_XATTR behavior by the
policy yet it lacks xattr support, try to fall back to genfs rather than
rejecting the mount. If a genfscon rule is found for the filesystem,
then change the behavior to SECURITY_FS_USE_GENFS, otherwise reject the
mount as before. A similar fallback is already done in security_fs_use()
if no behavior specification is found for the given filesystem.

This is needed e.g. for virtiofs, which may or may not support xattrs
depending on the backing host filesystem.

Example:
    # seinfo --genfs | grep ' ramfs'
       genfscon ramfs /  system_u:object_r:ramfs_t:s0
    # echo '(fsuse xattr ramfs (system_u object_r fs_t ((s0) (s0))))' >ramfs_xattr.cil
    # semodule -i ramfs_xattr.cil
    # mount -t ramfs none /mnt

Before:
    mount: /mnt: mount(2) system call failed: Operation not supported.

After:
    (mount succeeds)
    # ls -Zd /mnt
    system_u:object_r:ramfs_t:s0 /mnt

See also:
https://lore.kernel.org/selinux/20210105142148.GA3200@redhat.com/T/
https://github.com/fedora-selinux/selinux-policy/pull/478

Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-13 08:55:11 -05:00
Amir Goldstein
a9ffe682c5 selinux: fix inconsistency between inode_getxattr and inode_listsecurity
When inode has no listxattr op of its own (e.g. squashfs) vfs_listxattr
calls the LSM inode_listsecurity hooks to list the xattrs that LSMs will
intercept in inode_getxattr hooks.

When selinux LSM is installed but not initialized, it will list the
security.selinux xattr in inode_listsecurity, but will not intercept it
in inode_getxattr.  This results in -ENODATA for a getxattr call for an
xattr returned by listxattr.

This situation was manifested as overlayfs failure to copy up lower
files from squashfs when selinux is built-in but not initialized,
because ovl_copy_xattr() iterates the lower inode xattrs by
vfs_listxattr() and vfs_getxattr().

Match the logic of inode_listsecurity to that of inode_getxattr and
do not list the security.selinux xattr if selinux is not initialized.

Reported-by: Michael Labriola <michael.d.labriola@gmail.com>
Tested-by: Michael Labriola <michael.d.labriola@gmail.com>
Link: https://lore.kernel.org/linux-unionfs/2nv9d47zt7.fsf@aldarion.sourceruckus.org/
Fixes: c8e222616c ("selinux: allow reading labels before policy is loaded")
Cc: stable@vger.kernel.org#v5.9+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-04 20:41:09 -05:00
Paolo Abeni
95ca90726e selinux: handle MPTCP consistently with TCP
The MPTCP protocol uses a specific protocol value, even if
it's an extension to TCP. Additionally, MPTCP sockets
could 'fall-back' to TCP at run-time, depending on peer MPTCP
support and available resources.

As a consequence of the specific protocol number, selinux
applies the raw_socket class to MPTCP sockets.

Existing TCP application converted to MPTCP - or forced to
use MPTCP socket with user-space hacks - will need an
updated policy to run successfully.

This change lets selinux attach the TCP socket class to
MPTCP sockets, too, so that no policy changes are needed in
the above scenario.

Note that the MPTCP is setting, propagating and updating the
security context on all the subflows and related request
socket.

Link: https://lore.kernel.org/linux-security-module/CAHC9VhTaK3xx0hEGByD2zxfF7fadyPP1kb-WeWH_YCyq9X-sRg@mail.gmail.com/T/#t
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[PM: tweaked subject's prefix]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2021-01-04 19:43:59 -05:00
Linus Torvalds
ca5b877b6c selinux/stable-5.11 PR 20201214
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl/YBtEUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNnwA/9Ek8DG/1t8CEoJxpoRvwovQxNo+bi
 0rCT9vqvx9PeCwoZi/0Vp6oKmpE1HADvbeB/+e00VrbLYnzE3oRY6VkpjoZRofKS
 vc0/MzHSFxFUR1OTHwCefcXlPLK+bfitQbX5jEMeVyQCXNXXIrN7CnJf1LmCeLTR
 kQBPlEN9lt7HyNVAi34FhOD/TQbWnFHgl2z5puffgri6cWnc+TALKMYytUZ+rYex
 NYndDJW5b3g5kTat2eErn0FruxfzloGs0xMIiWb+z2i9kl41D+dkKPdAN7idqCSC
 Jv0nJP/bDftzA0wOe9szmGaLQzu7YnCN5kiWcSspatZVnon42Cy/tp9tiuPGLRFU
 XtelDfpyX6o3CLN0tX7LQEO+GYxPzvM6iaR2OrsChWPozUIIR3TLQg7jJN4bvNKl
 TR6gCGZCoAeS5JLNGjzVKxT/oKQY+tCLLlYXQdQY6swNFi3EKmPr+K1D9lgm98fO
 f3d1QmWiZZNmtxxoVogT0qoQYjkfgpnm3dVx813Vt+lwHlVpHGMEPpO27iD3/RYb
 w2yWOJaGKwMD8iL0l+Cm6CPW0/nE5FFISQjWgC8b4Vgxlyan6+L9eViqGICkrUQ2
 Edo0i1YFFZ4utHYkDf1VYBbJ+36KyCtdktgLAcbgnePiPB3E1XBsXTIIStSUIbVQ
 iEbTkBlsCG4GIeU=
 =6Cqb
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "While we have a small number of SELinux patches for v5.11, there are a
  few changes worth highlighting:

   - Change the LSM network hooks to pass flowi_common structs instead
     of the parent flowi struct as the LSMs do not currently need the
     full flowi struct and they do not have enough information to use it
     safely (missing information on the address family).

     This patch was discussed both with Herbert Xu (representing team
     netdev) and James Morris (representing team
     LSMs-other-than-SELinux).

   - Fix how we handle errors in inode_doinit_with_dentry() so that we
     attempt to properly label the inode on following lookups instead of
     continuing to treat it as unlabeled.

   - Tweak the kernel logic around allowx, auditallowx, and dontauditx
     SELinux policy statements such that the auditx/dontauditx are
     effective even without the allowx statement.

  Everything passes our test suite"

* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
  selinux: Fix fall-through warnings for Clang
  selinux: drop super_block backpointer from superblock_security_struct
  selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
  selinux: allow dontauditx and auditallowx rules to take effect without allowx
  selinux: fix error initialization in inode_doinit_with_dentry()
2020-12-16 11:01:04 -08:00
Florian Westphal
41dd9596d6 security: add const qualifier to struct sock in various places
A followup change to tcp_request_sock_op would have to drop the 'const'
qualifier from the 'route_req' function as the
'security_inet_conn_request' call is moved there - and that function
expects a 'struct sock *'.

However, it turns out its also possible to add a const qualifier to
security_inet_conn_request instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03 12:56:03 -08:00
Paul Moore
3df98d7921 lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
As pointed out by Herbert in a recent related patch, the LSM hooks do
not have the necessary address family information to use the flowi
struct safely.  As none of the LSMs currently use any of the protocol
specific flowi information, replace the flowi pointers with pointers
to the address family independent flowi_common struct.

Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-23 18:36:21 -05:00
Gustavo A. R. Silva
b2d99bcb27 selinux: Fix fall-through warnings for Clang
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
by explicitly adding a break statement instead of letting the code fall
through to the next case.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-23 18:21:13 -05:00
Ondrej Mosnacek
b159e86b5a selinux: drop super_block backpointer from superblock_security_struct
It appears to have been needed for selinux_complete_init() in the past,
but today it's useless.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-12 19:52:21 -05:00
Paul Moore
200ea5a229 selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
A previous fix, commit 83370b31a9 ("selinux: fix error initialization
in inode_doinit_with_dentry()"), changed how failures were handled
before a SELinux policy was loaded.  Unfortunately that patch was
potentially problematic for two reasons: it set the isec->initialized
state without holding a lock, and it didn't set the inode's SELinux
label to the "default" for the particular filesystem.  The later can
be a problem if/when a later attempt to revalidate the inode fails
and SELinux reverts to the existing inode label.

This patch should restore the default inode labeling that existed
before the original fix, without affecting the LABEL_INVALID marking
such that revalidation will still be attempted in the future.

Fixes: 83370b31a9 ("selinux: fix error initialization in inode_doinit_with_dentry()")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-11-05 22:47:31 -05:00
Tianyue Ren
83370b31a9 selinux: fix error initialization in inode_doinit_with_dentry()
Mark the inode security label as invalid if we cannot find
a dentry so that we will retry later rather than marking it
initialized with the unlabeled SID.

Fixes: 9287aed2ad ("selinux: Convert isec->lock into a spinlock")
Signed-off-by: Tianyue Ren <rentianyue@kylinos.cn>
[PM: minor comment tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-10-27 22:14:25 -04:00
Linus Torvalds
726eb70e0d Char/Misc driver patches for 5.10-rc1
Here is the big set of char, misc, and other assorted driver subsystem
 patches for 5.10-rc1.
 
 There's a lot of different things in here, all over the drivers/
 directory.  Some summaries:
 	- soundwire driver updates
 	- habanalabs driver updates
 	- extcon driver updates
 	- nitro_enclaves new driver
 	- fsl-mc driver and core updates
 	- mhi core and bus updates
 	- nvmem driver updates
 	- eeprom driver updates
 	- binder driver updates and fixes
 	- vbox minor bugfixes
 	- fsi driver updates
 	- w1 driver updates
 	- coresight driver updates
 	- interconnect driver updates
 	- misc driver updates
 	- other minor driver updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX4g8YQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yngKgCeNpArCP/9vQJRK9upnDm8ZLunSCUAn1wUT/2A
 /bTQ42c/WRQ+LU828GSM
 =6sO2
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the big set of char, misc, and other assorted driver subsystem
  patches for 5.10-rc1.

  There's a lot of different things in here, all over the drivers/
  directory. Some summaries:

   - soundwire driver updates

   - habanalabs driver updates

   - extcon driver updates

   - nitro_enclaves new driver

   - fsl-mc driver and core updates

   - mhi core and bus updates

   - nvmem driver updates

   - eeprom driver updates

   - binder driver updates and fixes

   - vbox minor bugfixes

   - fsi driver updates

   - w1 driver updates

   - coresight driver updates

   - interconnect driver updates

   - misc driver updates

   - other minor driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (396 commits)
  binder: fix UAF when releasing todo list
  docs: w1: w1_therm: Fix broken xref, mistakes, clarify text
  misc: Kconfig: fix a HISI_HIKEY_USB dependency
  LSM: Fix type of id parameter in kernel_post_load_data prototype
  misc: Kconfig: add a new dependency for HISI_HIKEY_USB
  firmware_loader: fix a kernel-doc markup
  w1: w1_therm: make w1_poll_completion static
  binder: simplify the return expression of binder_mmap
  test_firmware: Test partial read support
  firmware: Add request_partial_firmware_into_buf()
  firmware: Store opt_flags in fw_priv
  fs/kernel_file_read: Add "offset" arg for partial reads
  IMA: Add support for file reads without contents
  LSM: Add "contents" flag to kernel_read_file hook
  module: Call security_kernel_post_load_data()
  firmware_loader: Use security_post_load_data()
  LSM: Introduce kernel_post_load_data() hook
  fs/kernel_read_file: Add file_size output argument
  fs/kernel_read_file: Switch buffer size arg to size_t
  fs/kernel_read_file: Remove redundant size argument
  ...
2020-10-15 10:01:51 -07:00
Linus Torvalds
7b540812cc selinux/stable-5.10 PR 20201012
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl+E9UoUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMG2BAApHLKLsfH5gf7gZNjHmQxddg8maCl
 BGt7K1xc9iYBZN56Cbc7v9uKc5pM+UOoOlVmWh+8jaROpX10jJmvhsebQzpcWEEs
 O/BDg/Y/AafoLr5e7gbAnlA7TJXNSR9MG9RB7c9xC14LG/bqBmkaUNsv8isWlLgl
 J2atHLsdlvCbmqJvnc6Fh3VJCbY/I0kt9L04GBQ4pEK3TKOxtORQaQcjVgLhlcw9
 YdMPKYIwy2Ze2HUuyW2o9OuryHhoMrwxpN/35/PAxrRwpO0LVnjjiw6njQqYVGH3
 el8mPXlhHah/7QUKcngSsvcvUcaSencp9sUBrp1vK9C1vkSFyubZweVi4A2TEWnh
 Ctceje7XP/YWDcJ+5BgASvosQdqOBB7huuOOKVpvaBXqgUHFgaxphV4/FDNnlF62
 AteX5RcWb/JiFJ4YnbknPNa/MWxVYuVn78AlNsM2ZponWYWs9JZ17lX4tHAKF1Qm
 x6ZMvMCDJTj8622l8nw3dTZKNDE3nFblDThX8aSrAhCQQE6HvugbKU4Fzo1oiSPl
 84PlCPgb+3tP3OsvZDIOPCJxC6IHgS+meA0IjhjwuCb+U+YWaAIeOlOPSkxUmfLu
 iJVWHmDtsAM3bTBxwQudhgXF3a1oKCEqeqNxM6P6p55jti7xal9FnZNHTbSh2sO1
 Km4oIqTEb1XWNdU=
 =NNLw
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux updates from Paul Moore:
 "A decent number of SELinux patches for v5.10, twenty two in total. The
  highlights are listed below, but all of the patches pass our test
  suite and merge cleanly.

   - A number of changes to how the SELinux policy is loaded and managed
     inside the kernel with the goal of improving the atomicity of a
     SELinux policy load operation.

     These changes account for the bulk of the diffstat as well as the
     patch count. A special thanks to everyone who contributed patches
     and fixes for this work.

   - Convert the SELinux policy read-write lock to RCU.

   - A tracepoint was added for audited SELinux access control events;
     this should help provide a more unified backtrace across kernel and
     userspace.

   - Allow the removal of security.selinux xattrs when a SELinux policy
     is not loaded.

   - Enable policy capabilities in SELinux policies created with the
     scripts/selinux/mdp tool.

   - Provide some "no sooner than" dates for the SELinux checkreqprot
     sysfs deprecation"

* tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (22 commits)
  selinux: provide a "no sooner than" date for the checkreqprot removal
  selinux: Add helper functions to get and set checkreqprot
  selinux: access policycaps with READ_ONCE/WRITE_ONCE
  selinux: simplify away security_policydb_len()
  selinux: move policy mutex to selinux_state, use in lockdep checks
  selinux: fix error handling bugs in security_load_policy()
  selinux: convert policy read-write lock to RCU
  selinux: delete repeated words in comments
  selinux: add basic filtering for audit trace events
  selinux: add tracepoint on audited events
  selinux: Create new booleans and class dirs out of tree
  selinux: Standardize string literal usage for selinuxfs directory names
  selinux: Refactor selinuxfs directory populating functions
  selinux: Create function for selinuxfs directory cleanup
  selinux: permit removing security.selinux xattr before policy load
  selinux: fix memdup.cocci warnings
  selinux: avoid dereferencing the policy prior to initialization
  selinux: fix allocation failure check on newpolicy->sidtab
  selinux: refactor changing booleans
  selinux: move policy commit after updating selinuxfs
  ...
2020-10-13 16:29:55 -07:00
Kees Cook
2039bda1fa LSM: Add "contents" flag to kernel_read_file hook
As with the kernel_load_data LSM hook, add a "contents" flag to the
kernel_read_file LSM hook that indicates whether the LSM can expect
a matching call to the kernel_post_read_file LSM hook with the full
contents of the file. With the coming addition of partial file read
support for kernel_read_file*() API, the LSM will no longer be able
to always see the entire contents of a file during the read calls.

For cases where the LSM must read examine the complete file contents,
it will need to do so on its own every time the kernel_read_file
hook is called with contents=false (or reject such cases). Adjust all
existing LSMs to retain existing behavior.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-12-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:37:03 +02:00
Kees Cook
b64fcae74b LSM: Introduce kernel_post_load_data() hook
There are a few places in the kernel where LSMs would like to have
visibility into the contents of a kernel buffer that has been loaded or
read. While security_kernel_post_read_file() (which includes the
buffer) exists as a pairing for security_kernel_read_file(), no such
hook exists to pair with security_kernel_load_data().

Earlier proposals for just using security_kernel_post_read_file() with a
NULL file argument were rejected (i.e. "file" should always be valid for
the security_..._file hooks, but it appears at least one case was
left in the kernel during earlier refactoring. (This will be fixed in
a subsequent patch.)

Since not all cases of security_kernel_load_data() can have a single
contiguous buffer made available to the LSM hook (e.g. kexec image
segments are separately loaded), there needs to be a way for the LSM to
reason about its expectations of the hook coverage. In order to handle
this, add a "contents" argument to the "kernel_load_data" hook that
indicates if the newly added "kernel_post_load_data" hook will be called
with the full contents once loaded. That way, LSMs requiring full contents
can choose to unilaterally reject "kernel_load_data" with contents=false
(which is effectively the existing hook coverage), but when contents=true
they can allow it and later evaluate the "kernel_post_load_data" hook
once the buffer is loaded.

With this change, LSMs can gain coverage over non-file-backed data loads
(e.g. init_module(2) and firmware userspace helper), which will happen
in subsequent patches.

Additionally prepare IMA to start processing these cases.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/r/20201002173828.2099543-9-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:37:03 +02:00
Scott Branden
b89999d004 fs/kernel_read_file: Split into separate include file
Move kernel_read_file* out of linux/fs.h to its own linux/kernel_read_file.h
include file. That header gets pulled in just about everywhere
and doesn't really need functions not related to the general fs interface.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200706232309.12010-2-scott.branden@broadcom.com
Link: https://lore.kernel.org/r/20201002173828.2099543-4-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-05 13:34:18 +02:00
Lakshmi Ramasubramanian
8861d0af64 selinux: Add helper functions to get and set checkreqprot
checkreqprot data member in selinux_state struct is accessed directly by
SELinux functions to get and set. This could cause unexpected read or
write access to this data member due to compiler optimizations and/or
compiler's reordering of access to this field.

Add helper functions to get and set checkreqprot data member in
selinux_state struct. These helper functions use READ_ONCE and
WRITE_ONCE macros to ensure atomic read or write of memory for
this data member.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Suggested-by: Paul Moore <paul@paul-moore.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-09-15 14:36:28 -04:00
Stephen Smalley
9ff9abc4c6 selinux: move policy mutex to selinux_state, use in lockdep checks
Move the mutex used to synchronize policy changes (reloads and setting
of booleans) from selinux_fs_info to selinux_state and use it in
lockdep checks for rcu_dereference_protected() calls in the security
server functions.  This makes the dependency on the mutex explicit
in the code rather than relying on comments.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-27 09:52:47 -04:00
Stephen Smalley
1b8b31a2e6 selinux: convert policy read-write lock to RCU
Convert the policy read-write lock to RCU.  This is significantly
simplified by the earlier work to encapsulate the policy data
structures and refactor the policy load and boolean setting logic.
Move the latest_granting sequence number into the selinux_policy
structure so that it can be updated atomically with the policy.
Since removing the policy rwlock and moving latest_granting reduces
the selinux_ss structure to nothing more than a wrapper around the
selinux_policy pointer, get rid of the extra layer of indirection.

At present this change merely passes a hardcoded 1 to
rcu_dereference_check() in the cases where we know we do not need to
take rcu_read_lock(), with the preceding comment explaining why.
Alternatively we could pass fsi->mutex down from selinuxfs and
apply a lockdep check on it instead.

Based in part on earlier attempts to convert the policy rwlock
to RCU by Kaigai Kohei [1] and by Peter Enderborg [2].

[1] https://lore.kernel.org/selinux/6e2f9128-e191-ebb3-0e87-74bfccb0767f@tycho.nsa.gov/
[2] https://lore.kernel.org/selinux/20180530141104.28569-1-peter.enderborg@sony.com/

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-25 08:34:47 -04:00
Randy Dunlap
c76a2f9ecd selinux: delete repeated words in comments
Drop a repeated word in comments.
{open, is, then}

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: selinux@vger.kernel.org
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
[PM: fix subject line]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-24 09:03:14 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Stephen Smalley
9530a3e004 selinux: permit removing security.selinux xattr before policy load
Currently SELinux denies attempts to remove the security.selinux xattr
always, even when permissive or no policy is loaded.  This was originally
motivated by the view that all files should be labeled, even if that label
is unlabeled_t, and we shouldn't permit files that were once labeled to
have their labels removed entirely.  This however prevents removing
SELinux xattrs in the case where one "disables" SELinux by not loading
a policy (e.g. a system where runtime disable is removed and selinux=0
was not specified).  Allow removing the xattr before SELinux is
initialized.  We could conceivably permit it even after initialization
if permissive, or introduce a separate permission check here.

Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-08-20 21:55:31 -04:00
Jonathan Lebon
c8e222616c selinux: allow reading labels before policy is loaded
This patch does for `getxattr` what commit 3e3e24b420 ("selinux: allow
labeling before policy is loaded") did for `setxattr`; it allows
querying the current SELinux label on disk before the policy is loaded.

One of the motivations described in that commit message also drives this
patch: for Fedora CoreOS (and eventually RHEL CoreOS), we want to be
able to move the root filesystem for example, from xfs to ext4 on RAID,
on first boot, at initrd time.[1]

Because such an operation works at the filesystem level, we need to be
able to read the SELinux labels first from the original root, and apply
them to the files of the new root. The previous commit enabled the
second part of this process; this commit enables the first part.

[1] https://github.com/coreos/fedora-coreos-tracker/issues/94

Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-06-23 20:42:38 -04:00
Linus Torvalds
6c32978414 Notifications over pipes + Keyring notifications
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
 C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
 f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
 JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
 Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
 KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
 E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
 72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
 C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
 GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
 eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
 hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
 =YTGf
 -----END PGP SIGNATURE-----

Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull notification queue from David Howells:
 "This adds a general notification queue concept and adds an event
  source for keys/keyrings, such as linking and unlinking keys and
  changing their attributes.

  Thanks to Debarshi Ray, we do have a pull request to use this to fix a
  problem with gnome-online-accounts - as mentioned last time:

     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

  Without this, g-o-a has to constantly poll a keyring-based kerberos
  cache to find out if kinit has changed anything.

  [ There are other notification pending: mount/sb fsinfo notifications
    for libmount that Karel Zak and Ian Kent have been working on, and
    Christian Brauner would like to use them in lxc, but let's see how
    this one works first ]

  LSM hooks are included:

   - A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set. Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable. The
     LSM should use current's credentials. [Wanted by SELinux & Smack]

   - A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue. This is
     given the credentials from the event generator (which may be the
     system) and the watch setter. [Wanted by Smack]

  I've provided SELinux and Smack with implementations of some of these
  hooks.

  WHY
  ===

  Key/keyring notifications are desirable because if you have your
  kerberos tickets in a file/directory, your Gnome desktop will monitor
  that using something like fanotify and tell you if your credentials
  cache changes.

  However, we also have the ability to cache your kerberos tickets in
  the session, user or persistent keyring so that it isn't left around
  on disk across a reboot or logout. Keyrings, however, cannot currently
  be monitored asynchronously, so the desktop has to poll for it - not
  so good on a laptop. This facility will allow the desktop to avoid the
  need to poll.

  DESIGN DECISIONS
  ================

   - The notification queue is built on top of a standard pipe. Messages
     are effectively spliced in. The pipe is opened with a special flag:

        pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem
     like it will ever be applicable in this context)[?]. It is given up
     front to make it a lot easier to prohibit splice&co from accessing
     the pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
         O_* flag if I can avoid it - should I add a pipe3() system call
         instead?

     The pipe is then configured::

        ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
        ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

   - It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the
     kernel has to be able to insert messages into the pipe *without*
     holding pipe->mutex and the code to make this work needs careful
     auditing.

   - sendfile(), splice() and vmsplice() are disabled on notification
     pipes because of the pipe->mutex issue and also because they
     sometimes want to revert what they just did - but one or more
     notification messages might've been interleaved in the ring.

   - The kernel inserts messages with the wait queue spinlock held. This
     means that pipe_read() and pipe_write() have to take the spinlock
     to update the queue pointers.

   - Records in the buffer are binary, typed and have a length so that
     they can be of varying size.

     This allows multiple heterogeneous sources to share a common
     buffer; there are 16 million types available, of which I've used
     just a few, so there is scope for others to be used. Tags may be
     specified when a watchpoint is created to help distinguish the
     sources.

   - Records are filterable as types have up to 256 subtypes that can be
     individually filtered. Other filtration is also available.

   - Notification pipes don't interfere with each other; each may be
     bound to a different set of watches. Any particular notification
     will be copied to all the queues that are currently watching for it
     - and only those that are watching for it.

   - When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's
     insufficient space. read() will fabricate a loss notification
     message at an appropriate point later.

   - The notification pipe is created and then watchpoints are attached
     to it, using one of:

        keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
        watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
        watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is
     a tag between 0 and 255.

   - Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed. In the latter case, a message will
     be generated indicating the enforced watch removal.

  Things I want to avoid:

   - Introducing features that make the core VFS dependent on the
     network stack or networking namespaces (ie. usage of netlink).

   - Dumping all this stuff into dmesg and having a daemon that sits
     there parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky. Further, dmesg might not exist or might be
     inaccessible inside a container.

   - Letting users see events they shouldn't be able to see.

  TESTING AND MANPAGES
  ====================

   - The keyutils tree has a pipe-watch branch that has keyctl commands
     for making use of notifications. Proposed manual pages can also be
     found on this branch, though a couple of them really need to go to
     the main manpages repository instead.

     If the kernel supports the watching of keys, then running "make
     test" on that branch will cause the testing infrastructure to spawn
     a monitoring process on the side that monitors a notifications pipe
     for all the key/keyring changes induced by the tests and they'll
     all be checked off to make sure they happened.

        https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

   - A test program is provided (samples/watch_queue/watch_test) that
     can be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout"

* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  smack: Implement the watch_key and post_notification hooks
  selinux: Implement the watch_key security hook
  keys: Make the KEY_NEED_* perms an enum rather than a mask
  pipe: Add notification lossage handling
  pipe: Allow buffers to be marked read-whole-or-error for notifications
  Add sample notification program
  watch_queue: Add a key/keyring notification facility
  security: Add hooks to rule on setting a watch
  pipe: Add general notification queue support
  pipe: Add O_NOTIFICATION_PIPE
  security: Add a hook for the point of notification insertion
  uapi: General notification queue definitions
2020-06-13 09:56:21 -07:00
Linus Torvalds
15a2bc4dbb Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
 "Last cycle for the Nth time I ran into bugs and quality of
  implementation issues related to exec that could not be easily be
  fixed because of the way exec is implemented. So I have been digging
  into exec and cleanup up what I can.

  I don't think I have exec sorted out enough to fix the issues I
  started with but I have made some headway this cycle with 4 sets of
  changes.

   - promised cleanups after introducing exec_update_mutex

   - trivial cleanups for exec

   - control flow simplifications

   - remove the recomputation of bprm->cred

  The net result is code that is a bit easier to understand and work
  with and a decrease in the number of lines of code (if you don't count
  the added tests)"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits)
  exec: Compute file based creds only once
  exec: Add a per bprm->file version of per_clear
  binfmt_elf_fdpic: fix execfd build regression
  selftests/exec: Add binfmt_script regression test
  exec: Remove recursion from search_binary_handler
  exec: Generic execfd support
  exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  exec: Move the call of prepare_binprm into search_binary_handler
  exec: Allow load_misc_binary to call prepare_binprm unconditionally
  exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  exec: Teach prepare_exec_creds how exec treats uids & gids
  exec: Set the point of no return sooner
  exec: Move handling of the point of no return to the top level
  exec: Run sync_mm_rss before taking exec_update_mutex
  exec: Fix spelling of search_binary_handler in a comment
  exec: Move the comment from above de_thread to above unshare_sighand
  exec: Rename flush_old_exec begin_new_exec
  exec: Move most of setup_new_exec into flush_old_exec
  exec: In setup_new_exec cache current in the local variable me
  ...
2020-06-04 14:07:08 -07:00
Eric W. Biederman
b8bff59926 exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.

Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true.  The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.

Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec.  Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.

Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.

Add or upate comments a appropriate to bring them up to date and
to reflect this change.

Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-05-20 14:45:31 -05:00
David Howells
3e412ccc22 selinux: Implement the watch_key security hook
Implement the watch_key security hook to make sure that a key grants the
caller View permission in order to set a watch on a key.

For the moment, the watch_devices security hook is left unimplemented as
it's not obvious what the object should be since the queue is global and
didn't previously exist.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
2020-05-19 15:47:15 +01:00
David Howells
8c0637e950 keys: Make the KEY_NEED_* perms an enum rather than a mask
Since the meaning of combining the KEY_NEED_* constants is undefined, make
it so that you can't do that by turning them into an enum.

The enum is also given some extra values to represent special
circumstances, such as:

 (1) The '0' value is reserved and causes a warning to trap the parameter
     being unset.

 (2) The key is to be unlinked and we require no permissions on it, only
     the keyring, (this replaces the KEY_LOOKUP_FOR_UNLINK flag).

 (3) An override due to CAP_SYS_ADMIN.

 (4) An override due to an instantiation token being present.

 (5) The permissions check is being deferred to later key_permission()
     calls.

The extra values give the opportunity for LSMs to audit these situations.

[Note: This really needs overhauling so that lookup_user_key() tells
 key_task_permission() and the LSM what operation is being done and leaves
 it to those functions to decide how to map that onto the available
 permits.  However, I don't really want to make these change in the middle
 of the notifications patchset.]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
cc: Paul Moore <paul@paul-moore.com>
cc: Stephen Smalley <stephen.smalley.work@gmail.com>
cc: Casey Schaufler <casey@schaufler-ca.com>
cc: keyrings@vger.kernel.org
cc: selinux@vger.kernel.org
2020-05-19 15:42:22 +01:00
Linus Torvalds
39e16d9342 selinux/stable-5.7 PR 20200430
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6rPswUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMJBQ//dAU7VS01kQUUsFjd8xUIOk9aSbNy
 gjFkzcpbTsS4Mhqk0FSP3mfqDWP3lvxt9gx6WfnCf+a2KE3eTtf9bISW2OB2evIl
 9ydae6frJLiP6yIeAEZBb1PBQ6AxwBT8j8drKi57sOBC8rkmF66wiMaG2nybYW/j
 rvkOQCFtWj/A3b+Y7y8fVs8sjTHWvcsvkN7kwYmmdjyn7h/C1Tqc6TOOrt1jtLUG
 dgeak9bCIvK7JB/W6squ1iKqvkJhld7h5fZn6WB/6Xd1DKD1LVjGT8HsKpI/ei49
 0tAybqaLv8WxVc5ZGcAGoTt/X0hq3lXRiMG1Qgmed85wxjrLEpU12L6yprEtgtao
 0yY1JNizuC3Ehbi02o4gHf+RffLPWDrT8Kmu00/IuridCesNZCrEFpbAZmOwPU67
 nFufU0YlSnsVJ63C8TMhkI2eg/VejGjN4I2PEgcxEbZKBBW+nAcJfoKl1y2tzEo9
 ZNdZcetY9yJdpewjsF6VgsXs4qUrm1NUiG8pCXdK23+w/qYvZ4UqoYfoRNYIuRS0
 nRN40OkRYN6GzDZ+NCPYqgIhoEpps0p96VYQI6mp+PyOpwlCq8epKVjXD/TkteVG
 mevM2Ffy8xVaL47ufXxAHn+pA6F6Mdmo/rIwe+U5Olase96DTcFr90JPmVz58mcP
 6lWZhje3wS3mSpk=
 =HTrZ
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
 "Two more SELinux patches to fix problems in the v5.7-rcX releases.

  Wei Yongjun's patch fixes a return code in an error path, and my patch
  fixes a problem where we were not correctly applying access controls
  to all of the netlink messages in the netlink_send LSM hook"

* tag 'selinux-pr-20200430' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: properly handle multiple messages in selinux_netlink_send()
  selinux: fix error return code in cond_read_list()
2020-04-30 16:35:45 -07:00
Paul Moore
fb73974172 selinux: properly handle multiple messages in selinux_netlink_send()
Fix the SELinux netlink_send hook to properly handle multiple netlink
messages in a single sk_buff; each message is parsed and subject to
SELinux access control.  Prior to this patch, SELinux only inspected
the first message in the sk_buff.

Cc: stable@vger.kernel.org
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-04-30 16:18:37 -04:00
Linus Torvalds
b3aa112d57 selinux/stable-5.7 PR 20200330
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6Ch6wUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPdcg/9FDMS/n0Xl1HQBUyu26EwLu3aUpNE
 BdghXW1LKSTp7MrOENE60PGzZSAiC07ci1DqFd7zfLPZf2q5IwPwOBa/Avy8z95V
 oHKqcMT6WO1SPOm/PxZn16FCKyET4gZDTXvHBAyiyFsbk36R522ZY615P9T3eLu/
 ZA1NFsSjj68SqMCUlAWfeqjcbQiX63bryEpugOIg0qWy7R/+rtWxj9TjriZ+v9tq
 uC45UcjBqphpmoPG8BifA3jjyclwO3DeQb5u7E8//HPPraGeB19ntsymUg7ljoGk
 NrqCkZtv6E+FRCDTR5f0O7M1T4BWJodxw2NwngnTwKByLC25EZaGx80o+VyMt0eT
 Pog+++JZaa5zZr2OYOtdlPVMLc2ALL6p/8lHOqFU3GKfIf04hWOm6/Lb2IWoXs3f
 CG2b6vzoXYyjbF0Q7kxadb8uBY2S1Ds+CVu2HMBBsXsPdwbbtFWOT/6aRAQu61qn
 PW+f47NR8By3SO6nMzWts2SZEERZNIEdSKeXHuR7As1jFMXrHLItreb4GCSPay5h
 2bzRpxt2br5CDLh7Jv2pZnHtUqBWOSbslTix77+Z/hPKaNowvD9v3tc5hX87rDmB
 dYXROD6/KoyXFYDcMdphlvORFhqGqd5bEYuHHum/VjSIRh237+/nxFY/vZ4i4bzU
 2fvpAmUlVX1c4rw=
 =LlWA
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux updates from Paul Moore:
 "We've got twenty SELinux patches for the v5.7 merge window, the
  highlights are below:

   - Deprecate setting /sys/fs/selinux/checkreqprot to 1.

     This flag was originally created to deal with legacy userspace and
     the READ_IMPLIES_EXEC personality flag. We changed the default from
     1 to 0 back in Linux v4.4 and now we are taking the next step of
     deprecating it, at some point in the future we will take the final
     step of rejecting 1.

   - Allow kernfs symlinks to inherit the SELinux label of the parent
     directory. In order to preserve backwards compatibility this is
     protected by the genfs_seclabel_symlinks SELinux policy capability.

   - Optimize how we store filename transitions in the kernel, resulting
     in some significant improvements to policy load times.

   - Do a better job calculating our internal hash table sizes which
     resulted in additional policy load improvements and likely general
     SELinux performance improvements as well.

   - Remove the unused initial SIDs (labels) and improve how we handle
     initial SIDs.

   - Enable per-file labeling for the bpf filesystem.

   - Ensure that we properly label NFS v4.2 filesystems to avoid a
     temporary unlabeled condition.

   - Add some missing XFS quota command types to the SELinux quota
     access controls.

   - Fix a problem where we were not updating the seq_file position
     index correctly in selinuxfs.

   - We consolidate some duplicated code into helper functions.

   - A number of list to array conversions.

   - Update Stephen Smalley's email address in MAINTAINERS"

* tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: clean up indentation issue with assignment statement
  NFS: Ensure security label is set for root inode
  MAINTAINERS: Update my email address
  selinux: avtab_init() and cond_policydb_init() return void
  selinux: clean up error path in policydb_init()
  selinux: remove unused initial SIDs and improve handling
  selinux: reduce the use of hard-coded hash sizes
  selinux: Add xfs quota command types
  selinux: optimize storage of filename transitions
  selinux: factor out loop body from filename_trans_read()
  security: selinux: allow per-file labeling for bpffs
  selinux: generalize evaluate_cond_node()
  selinux: convert cond_expr to array
  selinux: convert cond_av_list to array
  selinux: convert cond_list to array
  selinux: sel_avc_get_stat_idx should increase position index
  selinux: allow kernfs symlinks to inherit parent directory context
  selinux: simplify evaluate_cond_node()
  Documentation,selinux: deprecate setting checkreqprot to 1
  selinux: move status variables out of selinux_ss
2020-03-31 15:07:55 -07:00
Richard Haines
e4cfa05e9b selinux: Add xfs quota command types
Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod
permission check.

Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger
filesystem quotaget permission check.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-22 14:41:21 -05:00
Connor O'Brien
4ca54d3d30 security: selinux: allow per-file labeling for bpffs
Add support for genfscon per-file labeling of bpffs files. This allows
for separate permissions for different pinned bpf objects, which may
be completely unrelated to each other.

Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: Steven Moreland <smoreland@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-11 22:02:54 -05:00
Linus Torvalds
a5650acb5f selinux/stable-5.6 PR 20200210
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl5ByJYUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMNLg/+Nj3gdWa5z0lsFuvUs87uM4HYHLtp
 9/wENEaqy7W92LAAEW5y+6sK8Pm9/CO3xf2d40SAr/xjwBo2Fbnfmx7QwDMUCQS6
 vyrLDiikSMyFU5j1AQDFuWjWF36FZlfqix+oXMETJ1tqxjIFxFN+rvbsDICPiQTl
 VxBG4CWWzNtSllYDDj3KYcVaGzNb3nleb8n436gFoHS++12BaypVA0kBlV3BqzPi
 mC0+gDuY06hwFmc/tAl8WndRwZpJ71rgsKJ5opgel61Zxf1J39QdTxeap9hhldJl
 5FbtoiGpnMXtHzpRGh6BHag/2gGX/0+J+t8ZATuk4GRtt1ueyYw9kNnAMblooJSV
 TwJlv1LLIKpAmSSJnKoN1AUAsxhS3dAmVyPuYWtmqEq8wq7YEYi5UnK1fH4ziI+b
 5LmYD0D/FoNE1Dr1TV78HmM9i0+AteH5FhkS0V6GD3v+wO6vCb3hTYaQpMWhhgnY
 /SEWbvLUO1fvQoKA8GT3UtmiUFdrqvrQFoL/l3weo9tvfyG3Walxo8s0w0r/DVUH
 AevIhUVi+snSLhI95fF26wskrZtEE9gwl/BoMjfKse0HK2t77xBE3kZW88NnSXOU
 Wk+ozfgibCnzk/Do/Y4iRr9GbgmCfscjOqlq1Yzyt/ZkUsG2egWHMzpritnzVEio
 NSQ6n6kpbkRgkHc=
 =0cTj
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull SELinux fixes from Paul Moore:
 "Two small fixes: one fixes a locking problem in the recently merged
  label translation code, the other fixes an embarrassing 'binderfs' /
  'binder' filesystem name check"

* tag 'selinux-pr-20200210' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix sidtab string cache locking
  selinux: fix typo in filesystem name
2020-02-10 16:51:35 -08:00
Christian Göttsche
7470d0d13f selinux: allow kernfs symlinks to inherit parent directory context
Currently symlinks on kernel filesystems, like sysfs, are labeled on
creation with the parent filesystem root sid.

Allow symlinks to inherit the parent directory context, so fine-grained
kernfs labeling can be applied to symlinks too and checking contexts
doesn't complain about them.

For backward-compatibility this behavior is contained in a new policy
capability: genfs_seclabel_symlinks

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Stephen Smalley
e9c38f9fc2 Documentation,selinux: deprecate setting checkreqprot to 1
Deprecate setting the SELinux checkreqprot tunable to 1 via kernel
parameter or /sys/fs/selinux/checkreqprot.  Setting it to 0 is left
intact for compatibility since Android and some Linux distributions
do so for security and treat an inability to set it as a fatal error.
Eventually setting it to 0 will become a no-op and the kernel will
stop using checkreqprot's value internally altogether.

checkreqprot was originally introduced as a compatibility mechanism
for legacy userspace and the READ_IMPLIES_EXEC personality flag.
However, if set to 1, it weakens security by allowing mappings to be
made executable without authorization by policy.  The default value
for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed
from 1 to 0 in commit 2a35d196c1 ("selinux: change
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android
and Linux distributions began explicitly setting
/sys/fs/selinux/checkreqprot to 0 some time ago.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Ondrej Mosnacek
4b36cb773a selinux: move status variables out of selinux_ss
It fits more naturally in selinux_state, since it reflects also global
state (the enforcing and policyload fields).

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-10 10:49:01 -05:00
Linus Torvalds
c9d35ee049 Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs file system parameter updates from Al Viro:
 "Saner fs_parser.c guts and data structures. The system-wide registry
  of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is
  the horror switch() in fs_parse() that would have to grow another case
  every time something got added to that system-wide registry.

  New syntax types can be added by filesystems easily now, and their
  namespace is that of functions - not of system-wide enum members. IOW,
  they can be shared or kept private and if some turn out to be widely
  useful, we can make them common library helpers, etc., without having
  to do anything whatsoever to fs_parse() itself.

  And we already get that kind of requests - the thing that finally
  pushed me into doing that was "oh, and let's add one for timeouts -
  things like 15s or 2h". If some filesystem really wants that, let them
  do it. Without somebody having to play gatekeeper for the variants
  blessed by direct support in fs_parse(), TYVM.

  Quite a bit of boilerplate is gone. And IMO the data structures make a
  lot more sense now. -200LoC, while we are at it"

* 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (25 commits)
  tmpfs: switch to use of invalfc()
  cgroup1: switch to use of errorfc() et.al.
  procfs: switch to use of invalfc()
  hugetlbfs: switch to use of invalfc()
  cramfs: switch to use of errofc() et.al.
  gfs2: switch to use of errorfc() et.al.
  fuse: switch to use errorfc() et.al.
  ceph: use errorfc() and friends instead of spelling the prefix out
  prefix-handling analogues of errorf() and friends
  turn fs_param_is_... into functions
  fs_parse: handle optional arguments sanely
  fs_parse: fold fs_parameter_desc/fs_parameter_spec
  fs_parser: remove fs_parameter_description name field
  add prefix to fs_context->log
  ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log
  new primitive: __fs_parse()
  switch rbd and libceph to p_log-based primitives
  struct p_log, variants of warnf() et.al. taking that one instead
  teach logfc() to handle prefices, give it saner calling conventions
  get rid of cg_invalf()
  ...
2020-02-08 13:26:41 -08:00
Al Viro
d7167b1499 fs_parse: fold fs_parameter_desc/fs_parameter_spec
The former contains nothing but a pointer to an array of the latter...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:37 -05:00
Eric Sandeen
96cafb9ccb fs_parser: remove fs_parameter_description name field
Unused now.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-02-07 14:48:36 -05:00
Hridya Valsaraju
a20456aef8 selinux: fix typo in filesystem name
Correct the filesystem name to "binder" to enable genfscon per-file
labelling for binderfs.

Fixes: 7a4b519474 ("selinux: allow per-file labelling for binderfs")
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: slight style changes to the subj/description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-02-05 18:23:13 -05:00
Stephen Smalley
98aa00345d selinux: fix regression introduced by move_mount(2) syscall
commit 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
introduced a new move_mount(2) system call and a corresponding new LSM
security_move_mount hook but did not implement this hook for any existing
LSM.  This creates a regression for SELinux with respect to consistent
checking of mounts; the existing selinux_mount hook checks mounton
permission to the mount point path.  Provide a SELinux hook
implementation for move_mount that applies this same check for
consistency.  In the future we may wish to add a new move_mount
filesystem permission and check as well, but this addresses
the immediate regression.

Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-20 07:42:37 -05:00
Paul Moore
cb89e24658 selinux: remove redundant allocation and helper functions
This patch removes the inode, file, and superblock security blob
allocation functions and moves the associated code into the
respective LSM hooks.  This patch also removes the inode_doinit()
function as it was a trivial wrapper around
inode_doinit_with_dentry() and called from one location in the code.

Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 14:38:03 -05:00
Huaisheng Ye
df4779b5d2 selinux: remove redundant selinux_nlmsg_perm
selinux_nlmsg_perm is used for only by selinux_netlink_send. Remove
the redundant function to simplify the code.

Fix a typo by suggestion from Stephen.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-16 14:34:36 -05:00
Ondrej Mosnacek
cfff75d897 selinux: reorder hooks to make runtime disable less broken
Commit b1d9e6b064 ("LSM: Switch to lists of hooks") switched the LSM
infrastructure to use per-hook lists, which meant that removing the
hooks for a given module was no longer atomic. Even though the commit
clearly documents that modules implementing runtime revmoval of hooks
(only SELinux attempts this madness) need to take special precautions to
avoid race conditions, SELinux has never addressed this.

By inserting an artificial delay between the loop iterations of
security_delete_hooks() (I used 100 ms), booting to a state where
SELinux is enabled, but policy is not yet loaded, and running these
commands:

    while true; do ping -c 1 <some IP>; done &
    echo -n 1 >/sys/fs/selinux/disable
    kill %1
    wait

...I was able to trigger NULL pointer dereferences in various places. I
also have a report of someone getting panics on a stock RHEL-8 kernel
after setting SELINUX=disabled in /etc/selinux/config and rebooting
(without adding "selinux=0" to kernel command-line).

Reordering the SELinux hooks such that those that allocate structures
are removed last seems to prevent these panics. It is very much possible
that this doesn't make the runtime disable completely race-free, but at
least it makes the operation much less fragile.

Cc: stable@vger.kernel.org
Fixes: b1d9e6b064 ("LSM: Switch to lists of hooks")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 15:26:55 -05:00
Ondrej Mosnacek
65cddd5098 selinux: treat atomic flags more carefully
The disabled/enforcing/initialized flags are all accessed concurrently
by threads so use the appropriate accessors that ensure atomicity and
document that it is expected.

Use smp_load/acquire...() helpers (with memory barriers) for the
initialized flag, since it gates access to the rest of the state
structures.

Note that the disabled flag is currently not used for anything other
than avoiding double disable, but it will be used for bailing out of
hooks once security_delete_hooks() is removed.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 15:19:39 -05:00
Stephen Smalley
b78b7d59bd selinux: make default_noexec read-only after init
SELinux checks whether VM_EXEC is set in the VM_DATA_DEFAULT_FLAGS
during initialization and saves the result in default_noexec for use
in its mmap and mprotect hook function implementations to decide
whether to apply EXECMEM, EXECHEAP, EXECSTACK, and EXECMOD checks.
Mark default_noexec as ro_after_init to prevent later clearing it
and thereby disabling these checks.  It is only set legitimately from
init code.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 12:26:20 -05:00
Huaisheng Ye
b82f3f6894 selinux: remove redundant msg_msg_alloc_security
selinux_msg_msg_alloc_security only calls msg_msg_alloc_security but
do nothing else. And also msg_msg_alloc_security is just used by the
former.

Remove the redundant function to simplify the code.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-10 11:32:13 -05:00
Hridya Valsaraju
7a4b519474 selinux: allow per-file labelling for binderfs
This patch allows genfscon per-file labeling for binderfs.
This is required to have separate permissions to allow
access to binder, hwbinder and vndbinder devices which are
relocating to binderfs.

Acked-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Hridya Valsaraju <hridya@google.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-01-06 21:11:18 -05:00
Stephen Smalley
6c5a682e64 selinux: clean up selinux_enabled/disabled/enforcing_boot
Rename selinux_enabled to selinux_enabled_boot to make it clear that
it only reflects whether SELinux was enabled at boot.  Replace the
references to it in the MAC_STATUS audit log in sel_write_enforce()
with hardcoded "1" values because this code is only reachable if SELinux
is enabled and does not change its value, and update the corresponding
MAC_STATUS audit log in sel_write_disable().  Stop clearing
selinux_enabled in selinux_disable() since it is not used outside of
initialization code that runs before selinux_disable() can be reached.
Mark both selinux_enabled_boot and selinux_enforcing_boot as __initdata
since they are only used in initialization code.

Wrap the disabled field in the struct selinux_state with
CONFIG_SECURITY_SELINUX_DISABLE since it is only used for
runtime disable.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-18 21:22:46 -05:00
Yang Guo
210a292874 selinux: remove unnecessary selinux cred request
task_security_struct was obtained at the beginning of may_create
and selinux_inode_init_security, no need to obtain again.
may_create will be called very frequently when create dir and file.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-12-12 08:50:39 -05:00