2
0
mirror of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git synced 2025-09-04 20:19:47 +08:00
Commit Graph

1987 Commits

Author SHA1 Message Date
Peter Foley
df68a01014 Documentation: use subdir-y to avoid unnecessary built-in.o files
Change the Documentation makefiles from obj-m to subdir-y
to avoid generating unnecessary built-in.o files since nothing
in Documentation/ is ever linked in to vmlinux.

Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-26 11:02:55 +02:00
Jaegeuk Kim
9b5f136fd4 f2fs: change the ipu_policy option to enable combinations
This patch changes the ipu_policy setting to use any combination of orthogonal policies.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-23 11:10:24 -07:00
Jaegeuk Kim
c1ce1b02bb f2fs: give an option to enable in-place-updates during fsync to users
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16 04:10:44 -07:00
Paul Bolle
731d5cca82 Documentation: NFS/RDMA: Document separate Kconfig symbols
The NFS/RDMA Kconfig symbol was split into separate options for client
and server in commit 2e8c12e1b7 ("xprtrdma: add separate Kconfig
options for NFSoRDMA client and server support").

Update the documentation to reflect this split.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07 15:21:13 -07:00
Rob Jones
77be4daf4e Documentation: seq_file: Document seq_open_private(), seq_release_private()
Despite the fact that these functions have been around for years, they
are little used (only 15 uses in 13 files at the preseht time) even
though many other files use work-arounds to achieve the same result.

By documenting them, hopefully they will become more widely used.

Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07 15:21:13 -07:00
Linus Torvalds
53b95d6341 File locking related bugfixes for v3.17 (pile #2)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT7ey6AAoJEAAOaEEZVoIVgdkQAJfToVhgddVMOweeGo4wqPqM
 lmS35dYVEy+gPfYhCU2Zytgk9yLlmNJeDT7Y+XGe4l15Ax/PDNuWLiRnUKRy1FrU
 l7cbKRAn1L6TqBO2ang5t68Bw7ojUpRoKWKnDyjVAj80mZYRZPWvQOurLeTtra2o
 XLnHbK54F36s07OXwSTZgbh/ffVQ1RMUBU8fy+0Ws3mTAbzO1KuB5Ws4pdt2ysjI
 14pBHO553X0VXAJxGkH66xxblt1o+Q9aBZp7RL0VNtR4bGU4FMvXy+0D5G6h8AGt
 rhl2PWTNKGg8HFgUK+7nCheH4j/0maXo541D3q+sJhbknRhD3n3x7IBvjkm9tdjB
 OgTkAp1mwL21mJP21MOrAil8uwGoSr7qTCngZY6nNWH4L3EkVl27+LlGDtkgBp/n
 BJyhcPbFSh+K4TTD3dg2rEx7wF/npQ6yPOljjvWXKJEm5lx3ZPkK1l1xS3QnVTMe
 pLq+wTZ9v1cU7+9JFWICQwclv4unjIgqxLo7/op5P5KvTWOHFW4cjdwCBPVE1g3a
 WC2c5jdB0Up6Z59aXAN3p5dk8MCy6NA41lkMfJN4AUAIzb6NvNeDBhrcFaHwXowm
 jgCtEPqFN/HqsEwJmhJ7fcIEKYrOCuzeaR5tGuwJ11re/oULGXTgMkrFwgm/YWwu
 esRBAc53/hZg6oo3ipUH
 =09BX
 -----END PGP SIGNATURE-----

Merge tag 'locks-v3.17-2' of git://git.samba.org/jlayton/linux

Pull file locking bugfixes from Jeff Layton:
 "Most of these patches are to fix a long-standing regression that crept
  in when the BKL was removed from the file-locking code.  The code was
  converted to use a conventional spinlock, but some fl_release_private
  ops can block and you can end up sleeping inside the lock.

  There's also a patch to make /proc/locks show delegations as 'DELEG'"

* tag 'locks-v3.17-2' of git://git.samba.org/jlayton/linux:
  locks: update Locking documentation to clarify fl_release_private behavior
  locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock
  locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped
  locks: don't reuse file_lock in __posix_lock_file
  locks: don't call locks_release_private from locks_copy_lock
  locks: show delegations as "DELEG" in /proc/locks
2014-08-16 08:58:47 -06:00
Jeff Layton
2ece173e47 locks: update Locking documentation to clarify fl_release_private behavior
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-08-14 10:08:20 -04:00
Linus Torvalds
f6f993328b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Stuff in here:

   - acct.c fixes and general rework of mnt_pin mechanism.  That allows
     to go for delayed-mntput stuff, which will permit mntput() on deep
     stack without worrying about stack overflows - fs shutdown will
     happen on shallow stack.  IOW, we can do Eric's umount-on-rmdir
     series without introducing tons of stack overflows on new mntput()
     call chains it introduces.
   - Bruce's d_splice_alias() patches
   - more Miklos' rename() stuff.
   - a couple of regression fixes (stable fodder, in the end of branch)
     and a fix for API idiocy in iov_iter.c.

  There definitely will be another pile, maybe even two.  I'd like to
  get Eric's series in this time, but even if we miss it, it'll go right
  in the beginning of for-next in the next cycle - the tricky part of
  prereqs is in this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
  fix copy_tree() regression
  __generic_file_write_iter(): fix handling of sync error after DIO
  switch iov_iter_get_pages() to passing maximal number of pages
  fs: mark __d_obtain_alias static
  dcache: d_splice_alias should detect loops
  exportfs: update Exporting documentation
  dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
  dcache: remove unused d_find_alias parameter
  dcache: d_obtain_alias callers don't all want DISCONNECTED
  dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
  dcache: d_splice_alias mustn't create directory aliases
  dcache: close d_move race in d_splice_alias
  dcache: move d_splice_alias
  namei: trivial fix to vfs_rename_dir comment
  VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
  cifs: support RENAME_NOREPLACE
  hostfs: support rename flags
  shmem: support RENAME_EXCHANGE
  shmem: support RENAME_NOREPLACE
  btrfs: add RENAME_NOREPLACE
  ...
2014-08-11 11:44:11 -07:00
Linus Torvalds
023f78b02c Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS updates from Steve French:
 "The most visible change in this set is the additional of multi-credit
  support for SMB2/SMB3 which dramatically improves the large file i/o
  performance for these dialects and significantly increases the maximum
  i/o size used on the wire for SMB2/SMB3.

  Also reconnection behavior after network failure is improved"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (35 commits)
  Add worker function to set allocation size
  [CIFS] Fix incorrect hex vs. decimal in some debug print statements
  update CIFS TODO list
  Add Pavel to contributor list in cifs AUTHORS file
  Update cifs version
  CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
  CIFS: Optimize readpages in a short read case on reconnects
  CIFS: Optimize cifs_user_read() in a short read case on reconnects
  CIFS: Improve indentation in cifs_user_read()
  CIFS: Fix possible buffer corruption in cifs_user_read()
  CIFS: Count got bytes in read_into_pages()
  CIFS: Use separate var for the number of bytes got in async read
  CIFS: Indicate reconnect with ECONNABORTED error code
  CIFS: Use multicredits for SMB 2.1/3 reads
  CIFS: Fix rsize usage for sync read
  CIFS: Fix rsize usage in user read
  CIFS: Separate page reading from user read
  CIFS: Fix rsize usage in readpages
  CIFS: Separate page search from readpages
  CIFS: Use multicredits for SMB 2.1/3 writes
  ...
2014-08-09 13:03:34 -07:00
J. Bruce Fields
9635389534 exportfs: update Exporting documentation
Minor documentation updates:
	- refer to d_obtain_alias rather than d_alloc_anon
	- explain when to use d_splice_alias and when
	  d_materialise_unique.
	- cut some details of d_splice_alias/d_materialise_unique
	  implementation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
NeilBrown
b8faf035ea VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
In REF-walk mode, ->d_manage can return -EISDIR to indicate
that the dentry is not really a mount trap (or even a mount point)
and that any mounts or any DCACHE_NEED_AUTOMOUNT flag should be
ignored.

RCU-walk mode doesn't currently support this, so if there is a dentry
with DCACHE_NEED_AUTOMOUNT set but which shouldn't be a mount-trap,
lookup_fast() will always drop in REF-walk mode.

With this patch, an -EISDIR from ->d_manage will always cause mounts
and automounts to be ignored, both in REF-walk and RCU-walk.

Bug-fixed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Ian Kent <raven@themaw.net>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07 14:40:10 -04:00
Linus Torvalds
e7fda6c4c3 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and time updates from Thomas Gleixner:
 "A rather large update of timers, timekeeping & co

   - Core timekeeping code is year-2038 safe now for 32bit machines.
     Now we just need to fix all in kernel users and the gazillion of
     user space interfaces which rely on timespec/timeval :)

   - Better cache layout for the timekeeping internal data structures.

   - Proper nanosecond based interfaces for in kernel users.

   - Tree wide cleanup of code which wants nanoseconds but does hoops
     and loops to convert back and forth from timespecs.  Some of it
     definitely belongs into the ugly code museum.

   - Consolidation of the timekeeping interface zoo.

   - A fast NMI safe accessor to clock monotonic for tracing.  This is a
     long standing request to support correlated user/kernel space
     traces.  With proper NTP frequency correction it's also suitable
     for correlation of traces accross separate machines.

   - Checkpoint/restart support for timerfd.

   - A few NOHZ[_FULL] improvements in the [hr]timer code.

   - Code move from kernel to kernel/time of all time* related code.

   - New clocksource/event drivers from the ARM universe.  I'm really
     impressed that despite an architected timer in the newer chips SoC
     manufacturers insist on inventing new and differently broken SoC
     specific timers.

[ Ed. "Impressed"? I don't think that word means what you think it means ]

   - Another round of code move from arch to drivers.  Looks like most
     of the legacy mess in ARM regarding timers is sorted out except for
     a few obnoxious strongholds.

   - The usual updates and fixlets all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  timekeeping: Fixup typo in update_vsyscall_old definition
  clocksource: document some basic timekeeping concepts
  timekeeping: Use cached ntp_tick_length when accumulating error
  timekeeping: Rework frequency adjustments to work better w/ nohz
  timekeeping: Minor fixup for timespec64->timespec assignment
  ftrace: Provide trace clocks monotonic
  timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC
  seqcount: Add raw_write_seqcount_latch()
  seqcount: Provide raw_read_seqcount()
  timekeeping: Use tk_read_base as argument for timekeeping_get_ns()
  timekeeping: Create struct tk_read_base and use it in struct timekeeper
  timekeeping: Restructure the timekeeper some more
  clocksource: Get rid of cycle_last
  clocksource: Move cycle_last validation to core code
  clocksource: Make delta calculation a function
  wireless: ath9k: Get rid of timespec conversions
  drm: vmwgfx: Use nsec based interfaces
  drm: i915: Use nsec based interfaces
  timekeeping: Provide ktime_get_raw()
  hangcheck-timer: Use ktime_get_ns()
  ...
2014-08-05 17:46:42 -07:00
Linus Torvalds
b54ecfb702 Merge tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This series includes patches to:
   - add nobarrier mount option
   - support tmpfile and rename2
   - enhance the fdatasync behavior
   - fix the error path
   - fix the recovery routine
   - refactor a part of the checkpoint procedure
   - reduce some lock contentions"

* tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: use for_each_set_bit to simplify the code
  f2fs: add f2fs_balance_fs for expand_inode_data
  f2fs: invalidate xattr node page when evict inode
  f2fs: avoid skipping recover_inline_xattr after recover_inline_data
  f2fs: add tracepoint for f2fs_direct_IO
  f2fs: reduce competition among node page writes
  f2fs: fix coding style
  f2fs: remove redundant lines in allocate_data_block
  f2fs: add tracepoint for f2fs_issue_flush
  f2fs: avoid retrying wrong recovery routine when error was occurred
  f2fs: test before set/clear bits
  f2fs: fix wrong condition for unlikely
  f2fs: enable in-place-update for fdatasync
  f2fs: skip unnecessary data writes during fsync
  f2fs: add info of appended or updated data writes
  f2fs: use radix_tree for ino management
  f2fs: add infra for ino management
  f2fs: punch the core function for inode management
  f2fs: add nobarrier mount option
  f2fs: fix to put root inode in error path of fill_super
  ...
2014-08-04 20:30:07 -07:00
Steve French
2075cf0b71 update CIFS TODO list
Signed-off-by: Steve French <smfrench@gmail.com>
2014-08-02 01:26:07 -05:00
Steve French
480e83275a Add Pavel to contributor list in cifs AUTHORS file
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Pavel Shilovsky <pshilovsky@samba.org>
2014-08-02 01:23:05 -05:00
Jaegeuk Kim
0f7b2abd18 f2fs: add nobarrier mount option
This patch adds a mount option, nobarrier, in f2fs.
The assumption in here is that file system keeps the IO ordering, but
doesn't care about cache flushes inside the storages.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-29 05:27:48 -07:00
Cyrill Gorcunov
854d06d9fc docs: Procfs -- Document timerfd output
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Link: http://lkml.kernel.org/r/20140715215703.199905126@openvz.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-18 11:49:57 +02:00
NeilBrown
743162013d sched: Remove proliferation of wait_on_bit() action functions
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 15:10:39 +02:00
Linus Torvalds
16b9057804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "This the bunch that sat in -next + lock_parent() fix.  This is the
  minimal set; there's more pending stuff.

  In particular, I really hope to get acct.c fixes merged this cycle -
  we need that to deal sanely with delayed-mntput stuff.  In the next
  pile, hopefully - that series is fairly short and localized
  (kernel/acct.c, fs/super.c and fs/namespace.c).  In this pile: more
  iov_iter work.  Most of prereqs for ->splice_write with sane locking
  order are there and Kent's dio rewrite would also fit nicely on top of
  this pile"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
  lock_parent: don't step on stale ->d_parent of all-but-freed one
  kill generic_file_splice_write()
  ceph: switch to iter_file_splice_write()
  shmem: switch to iter_file_splice_write()
  nfs: switch to iter_splice_write_file()
  fs/splice.c: remove unneeded exports
  ocfs2: switch to iter_file_splice_write()
  ->splice_write() via ->write_iter()
  bio_vec-backed iov_iter
  optimize copy_page_{to,from}_iter()
  bury generic_file_aio_{read,write}
  lustre: get rid of messing with iovecs
  ceph: switch to ->write_iter()
  ceph_sync_direct_write: stop poking into iov_iter guts
  ceph_sync_read: stop poking into iov_iter guts
  new helper: copy_page_from_iter()
  fuse: switch to ->write_iter()
  btrfs: switch to ->write_iter()
  ocfs2: switch to ->write_iter()
  xfs: switch to ->write_iter()
  ...
2014-06-12 10:30:18 -07:00
Al Viro
9c1d5284c7 Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus
Backmerge of dcache.c changes from mainline.  It's that, or complete
rebase...

Conflicts:
	fs/splice.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-12 00:28:09 -04:00
Linus Torvalds
5b174fd647 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "The largest piece is a long-overdue rewrite of the xdr code to remove
  some annoying limitations: for example, there was no way to return
  ACLs larger than 4K, and readdir results were returned only in 4k
  chunks, limiting performance on large directories.

  Also:
        - part of Neil Brown's work to make NFS work reliably over the
          loopback interface (so client and server can run on the same
          machine without deadlocks).  The rest of it is coming through
          other trees.
        - cleanup and bugfixes for some of the server RDMA code, from
          Steve Wise.
        - Various cleanup of NFSv4 state code in preparation for an
          overhaul of the locking, from Jeff, Trond, and Benny.
        - smaller bugfixes and cleanup from Christoph Hellwig and
          Kinglong Mee.

  Thanks to everyone!

  This summer looks likely to be busier than usual for knfsd.  Hopefully
  we won't break it too badly; testing definitely welcomed"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits)
  nfsd4: fix FREE_STATEID lockowner leak
  svcrdma: Fence LOCAL_INV work requests
  svcrdma: refactor marshalling logic
  nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry
  nfs4: remove unused CHANGE_SECURITY_LABEL
  nfsd4: kill READ64
  nfsd4: kill READ32
  nfsd4: simplify server xdr->next_page use
  nfsd4: hash deleg stateid only on successful nfs4_set_delegation
  nfsd4: rename recall_lock to state_lock
  nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound
  nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open
  nfsd4: use recall_lock for delegation hashing
  nfsd: fix laundromat next-run-time calculation
  nfsd: make nfsd4_encode_fattr static
  SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
  nfsd: remove unused function nfsd_read_file
  nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer
  NFSD: Error out when getting more than one fsloc/secinfo/uuid
  NFSD: Using type of uint32_t for ex_nflavors instead of int
  ...
2014-06-10 11:50:57 -07:00
Linus Torvalds
64b2d1fbbf f2fs updates for v3.16
This patch-set includes the following major enhancement patches.
  o enhance wait_on_page_writeback
  o support SEEK_DATA and SEEK_HOLE
  o enhance readahead flows
  o enhance IO flushes
  o support fiemap
  o add some tracepoints
 
 The other bug fixes are as follows.
  o fix to support a large volume > 2TB correctly
  o recovery bug fix wrt fallocated space
  o fix recursive lock on xattr operations
  o fix some cases on the remount flow
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTleLYAAoJEEAUqH6CSFDSdhEP/iY5hTZ02jH4ZiFPP/Nee4hS
 l0BHeZvrMjoccaWUDu0ZvIPC8BiJ7kLOgK+/VWS7LAfY1PY11ALEtYQOrW+RM47+
 jMfULegTod/F8WS2B6dk31QMhAZldtnsYvA5PS1VV3S0rht+qbOz+PDejZFgSMc3
 VVQ7OZzq30gMmtsw7oh3FHfeTu4xe/bxygdMRXgljQQD2MvorJiOb4MA+ovEDd9z
 CZMMTQvRKjc0d8LPf0gOkZEvG63GR6klCgFMuiappUsua0O8IPIEhCyEGFrE66vS
 fObVKpqNAsR2ABhS2grgn6Q7UUvn4xrF6jOwDH5tuw2yzmkQiMAWINwBdgnbEy1c
 D5S89PQ8TkQK9KBSoU0v8WKWC4HzWZF4ZEd6eq9VxVTS8iT0w8DtNHXK518FVC34
 82iqrLc0EhrcGNiW/7Nrc6WzHrWqorylCFN7atB3ruhVqeMh3MZsDU4Gq0WgB2oh
 pF9XVBEpJQpV35DYSAPzLkm+2+mwHVNqwdY3HcvUs7IP90+wZlrWSRG2FEfFt/e8
 6nwvbORrHMTI5VfdYlEPgpjuesFmYPZqEvRGdaDa1YrHqhvvgStEPT9qiq2qVn9+
 tr0HjpNRDD/etkaE6ujYOYqdxuk3mm6RY68h+KSbNcY1/VTvt2bN2telwWuDsxIq
 8yOsxV2x3JB/euDAJsSU
 =xqsO
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, there is no special interesting feature, but we've
  investigated a couple of tuning points with respect to the I/O flow.
  Several major bug fixes and a bunch of clean-ups also have been made.

  This patch-set includes the following major enhancement patches:
   - enhance wait_on_page_writeback
   - support SEEK_DATA and SEEK_HOLE
   - enhance readahead flows
   - enhance IO flushes
   - support fiemap
   - add some tracepoints

  The other bug fixes are as follows:
   - fix to support a large volume > 2TB correctly
   - recovery bug fix wrt fallocated space
   - fix recursive lock on xattr operations
   - fix some cases on the remount flow

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits)
  f2fs: support f2fs_fiemap
  f2fs: avoid not to call remove_dirty_inode
  f2fs: recover fallocated space
  f2fs: fix to recover data written by dio
  f2fs: large volume support
  f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages
  f2fs: avoid overflow when large directory feathure is enabled
  f2fs: fix recursive lock by f2fs_setxattr
  MAINTAINERS: add a co-maintainer from samsung for F2FS
  MAINTAINERS: change the email address for f2fs
  f2fs: use inode_init_owner() to simplify codes
  f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency
  f2fs: add a tracepoint for f2fs_read_data_page
  f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages
  f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page
  f2fs: add a tracepoint for f2fs_write_end
  f2fs: add a tracepoint for f2fs_write_begin
  f2fs: fix checkpatch warning
  f2fs: deactivate inode page if the inode is evicted
  f2fs: decrease the lock granularity during write_begin
  ...
2014-06-09 19:11:44 -07:00
Fabian Frederick
0b07cb8271 Documentation/filesystems/seq_file.txt: create_proc_entry deprecated
Linked article in seq_file.txt still uses create_proc_entry which was
removed in commit 80e928f7eb ("proc: Kill create_proc_entry()").

This patch adds information for kernel 3.10 and above

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:11 -07:00
Conrad Meyer
190a8843de fs/fat/: add support for DOS 1.x formatted volumes
Add structure for parsed BPB information, struct fat_bios_param_block,
and move all of the deserialization and validation logic from
fat_fill_super() into fat_read_bpb().

Add a 'dos1xfloppy' mount option to infer DOS 2.x BIOS Parameter Block
defaults from block device geometry for ancient floppies and floppy
images, as a fall-back from the default BPB parsing logic.

When fat_read_bpb() finds an invalid FAT filesystem and dos1xfloppy is
set, fall back to fat_read_static_bpb().  fat_read_static_bpb()
validates that the entire BPB is zero, and that the floppy has a
DOS-style 8086 code bootstrapping header.  Then it fills in default BPB
values from media size and a table.[0]

Media size is assumed to be static for archaic FAT volumes.  See also:
[1].

Fixes kernel.org bug #42617.

[0]: https://en.wikipedia.org/wiki/File_Allocation_Table#Exceptions
[1]: http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html

[hirofumi@mail.parknet.co.jp: fix missed error code]
Signed-off-by: Conrad Meyer <cse.cem@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Tested-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:10 -07:00
Linus Torvalds
1aacb90eaa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next
Pull trivial tree changes from Jiri Kosina:
 "Usual pile of patches from trivial tree that make the world go round"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  staging: go7007: remove reference to CONFIG_KMOD
  aic7xxx: Remove obsolete preprocessor define
  of: dma: doc fixes
  doc: fix incorrect formula to calculate CommitLimit value
  doc: Note need of bc in the kernel build from 3.10 onwards
  mm: Fix printk typo in dmapool.c
  modpost: Fix comment typo "Modules.symvers"
  Kconfig.debug: Grammar s/addition/additional/
  wimax: Spelling s/than/that/, wording s/destinatary/recipient/
  aic7xxx: Spelling s/termnation/termination/
  arm64: mm: Remove superfluous "the" in comment
  of: Spelling s/anonymouns/anonymous/
  dma: imx-sdma: Spelling s/determnine/determine/
  ath10k: Improve grammar in comments
  ath6kl: Spelling s/determnine/determine/
  of: Improve grammar for of_alias_get_id() documentation
  drm/exynos: Spelling s/contro/control/
  radio-bcm2048.c: fix wrong overflow check
  doc: printk-formats: do not mention casts for u64/s64
  doc: spelling error changes
  ...
2014-06-04 08:50:34 -07:00
Chao Yu
bfec07d0f8 f2fs: avoid overflow when large directory feathure is enabled
When large directory feathure is enable, We have one case which could cause
overflow in dir_buckets() as following:
special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2.

Here we define MAX_DIR_BUCKETS to limit the return value when the condition
could trigger potential overflow.

Changes from V1
 o modify description of calculation in f2fs.txt suggested by Changman Lee.

Suggested-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-06-04 13:34:30 +09:00
J. Bruce Fields
b042098063 nfsd4: allow exotic read compounds
I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:12 -04:00
Jan Moskyto Matejka
3568a1dbf1 Documentation: update /proc/stat "intr" count summary
The sum at the beginning of line "intr" includes also unnumbered
interrupts.  It implies that the sum at the beginning isn't the sum
of the remainder of the line, not even an estimation.

Fixed the documentation to mention that.

This behaviour was added to /proc/stat in commit a2eddfa959 ("x86:
make /proc/stat account for all interrupts")

Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-25 12:39:00 -07:00
Petr Oros
7a9e6da11c doc: fix incorrect formula to calculate CommitLimit value
The formula to calculate "CommitLimit" value mentioned in kernel documentation is incorrect.
Right formula is: CommitLimit = ([total RAM pages] - [total huge TLB pages]) * overcommit_ratio / 100 + [total swap pages]

Signed-off-by: Petr Oros <poros@redhat.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-05-22 23:01:55 +02:00
Al Viro
293bc9822f new methods: ->read_iter() and ->write_iter()
Beginning to introduce those.  Just the callers for now, and it's
clumsier than it'll eventually become; once we finish converting
aio_read and aio_write instances, the things will get nicer.

For now, these guys are in parallel to ->aio_read() and ->aio_write();
they take iocb and iov_iter, with everything in iov_iter already
validated.  File offset is passed in iocb->ki_pos, iov/nr_segs -
in iov_iter.

Main concerns in that series are stack footprint and ability to
split the damn thing cleanly.

[fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:36:00 -04:00
Al Viro
d8d3d94b80 pass iov_iter to ->direct_IO()
unmodified, for now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06 17:32:44 -04:00
Carlos Garcia
c98be0c96d doc: spelling error changes
Fixed multiple spelling errors.

Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Carlos E. Garcia <carlos@cgarcia.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-05-05 15:32:05 +02:00
Linus Torvalds
5166701b36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "The first vfs pile, with deep apologies for being very late in this
  window.

  Assorted cleanups and fixes, plus a large preparatory part of iov_iter
  work.  There's a lot more of that, but it'll probably go into the next
  merge window - it *does* shape up nicely, removes a lot of
  boilerplate, gets rid of locking inconsistencie between aio_write and
  splice_write and I hope to get Kent's direct-io rewrite merged into
  the same queue, but some of the stuff after this point is having
  (mostly trivial) conflicts with the things already merged into
  mainline and with some I want more testing.

  This one passes LTP and xfstests without regressions, in addition to
  usual beating.  BTW, readahead02 in ltp syscalls testsuite has started
  giving failures since "mm/readahead.c: fix readahead failure for
  memoryless NUMA nodes and limit readahead pages" - might be a false
  positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  missing bits of "splice: fix racy pipe->buffers uses"
  cifs: fix the race in cifs_writev()
  ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
  kill generic_file_buffered_write()
  ocfs2_file_aio_write(): switch to generic_perform_write()
  ceph_aio_write(): switch to generic_perform_write()
  xfs_file_buffered_aio_write(): switch to generic_perform_write()
  export generic_perform_write(), start getting rid of generic_file_buffer_write()
  generic_file_direct_write(): get rid of ppos argument
  btrfs_file_aio_write(): get rid of ppos
  kill the 5th argument of generic_file_buffered_write()
  kill the 4th argument of __generic_file_aio_write()
  lustre: don't open-code kernel_recvmsg()
  ocfs2: don't open-code kernel_recvmsg()
  drbd: don't open-code kernel_recvmsg()
  constify blk_rq_map_user_iov() and friends
  lustre: switch to kernel_sendmsg()
  ocfs2: don't open-code kernel_sendmsg()
  take iov_iter stuff to mm/iov_iter.c
  process_vm_access: tidy up a bit
  ...
2014-04-12 14:49:50 -07:00
Linus Torvalds
26c12d9334 Merge branch 'akpm' (incoming from Andrew)
Merge second patch-bomb from Andrew Morton:
 - the rest of MM
 - zram updates
 - zswap updates
 - exit
 - procfs
 - exec
 - wait
 - crash dump
 - lib/idr
 - rapidio
 - adfs, affs, bfs, ufs
 - cris
 - Kconfig things
 - initramfs
 - small amount of IPC material
 - percpu enhancements
 - early ioremap support
 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
  MAINTAINERS: update Intel C600 SAS driver maintainers
  fs/ufs: remove unused ufs_super_block_third pointer
  fs/ufs: remove unused ufs_super_block_second pointer
  fs/ufs: remove unused ufs_super_block_first pointer
  fs/ufs/super.c: add __init to init_inodecache()
  doc/kernel-parameters.txt: add early_ioremap_debug
  arm64: add early_ioremap support
  arm64: initialize pgprot info earlier in boot
  x86: use generic early_ioremap
  mm: create generic early_ioremap() support
  x86/mm: sparse warning fix for early_memremap
  lglock: map to spinlock when !CONFIG_SMP
  percpu: add preemption checks to __this_cpu ops
  vmstat: use raw_cpu_ops to avoid false positives on preemption checks
  slub: use raw_cpu_inc for incrementing statistics
  net: replace __this_cpu_inc in route.c with raw_cpu_inc
  modules: use raw_cpu_write for initialization of per cpu refcount.
  mm: use raw_cpu ops for determining current NUMA node
  percpu: add raw_cpu_ops
  slub: fix leak of 'name' in sysfs_slab_add
  ...
2014-04-07 16:38:06 -07:00
Fabian Frederick
8ca577223f affs: add mount option to avoid filename truncates
Normal behavior for filenames exceeding specific filesystem limits is to
refuse operation.

AFFS standard name length being only 30 characters against 255 for usual
Linux filesystems, original implementation does filename truncate by
default with a define value AFFS_NO_TRUNCATE which can be enabled but
needs module compilation.

This patch adds 'nofilenametruncate' mount option so that user can
easily activate that feature and avoid a lot of problems (eg overwrite
files ...)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:08 -07:00
Andrey Vagin
49d063cb35 proc: show mnt_id in /proc/pid/fdinfo
Currently we don't have a way how to determing from which mount point
file has been opened.  This information is required for proper dumping
and restoring file descriptos due to presence of mount namespaces.  It's
possible, that two file descriptors are opened using the same paths, but
one fd references mount point from one namespace while the other fd --
from other namespace.

$ ls -l /proc/1/fd/1
lrwx------ 1 root root 64 Mar 19 23:54 /proc/1/fd/1 -> /dev/null

$ cat /proc/1/fdinfo/1
pos:	0
flags:	0100002
mnt_id:	16

$ cat /proc/1/mountinfo | grep ^16
16 32 0:4 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,size=1013356k,nr_inodes=253339,mode=755

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Rob Landley <rob@landley.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:04 -07:00
Kirill A. Shutemov
8c6e50b029 mm: introduce vm_ops->map_pages()
Here's new version of faultaround patchset.  It took a while to tune it
and collect performance data.

First patch adds new callback ->map_pages to vm_operations_struct.

->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from
"pgoff" till "max_pgoff".  ->map_pages() is called with page table
locked and must not block.  If it's not possible to reach a page without
blocking, filesystem should skip it.  Filesystem should use do_set_pte()
to setup page table entry.  Pointer to entry associated with offset
"pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
entries for other offsets should be calculated relative to "pte".

Currently VM use ->map_pages only on read page fault path.  We try to
map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
Performance data for different FAULT_AROUND_ORDER is below.

TODO:
 - implement ->map_pages() for shmem/tmpfs;
 - modify get_user_pages() to be able to use ->map_pages() and implement
   mmap(MAP_POPULATE|MAP_NONBLOCK) on top.

=========================================================================
Tested on 4-socket machine (120 threads) with 128GiB of RAM.

Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
somewhere between 3 and 5. Let's say 4 :)

Linux build (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
Linux rebuild (make -j60)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
Git test suite (make -j60 test)
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413

Two synthetic tests: access every word in file in sequential/random order.
It doesn't improve much after FAULT_AROUND_ORDER == 4.

Sequential access 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
 8 threads
	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
 32 threads
	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
 60 threads
	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
 120 threads
	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
Random access 1GiB file
 1 thread
	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
 8 threads
	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
 32 threads
	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
 60 threads
	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
 120 threads
	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594

Touch only one page in page table in 16GiB file
FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
 1 thread
	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
 8 threads
	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
 32 threads
	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
 60 threads
	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
 120 threads
	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125

This patch (of 2):

Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
accessible pages around fault address.

On read page fault, if filesystem provides ->map_pages(), we try to map up
to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
number of minor page faults.

We call ->map_pages first and use ->fault() as fallback if page by the
offset is not ready to be mapped (cold page cache or something).

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:35:52 -07:00
Linus Torvalds
3021112598 f2fs updates for v3.15
This patch-set includes the following major enhancement patches.
  o introduce large directory support
  o introduce f2fs_issue_flush to merge redundant flush commands
  o merge write IOs as much as possible aligned to the segment
  o add sysfs entries to tune the f2fs configuration
  o use radix_tree for the free_nid_list to reduce in-memory operations
  o remove costly bit operations in f2fs_find_entry
  o enhance the readahead flow for CP/NAT/SIT/SSA blocks
 
 The other bug fixes are as follows.
  o recover xattr node blocks correctly after sudden-power-cut
  o fix to calculate the maximum number of node ids
  o enhance to handle many error cases
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQiQrAAoJEEAUqH6CSFDSlbIP/iq06BrUeMDLoQFhA2GQFKFD
 wd0A5h9hCiFcKBcI/u/aAQqj/a5wdwzDl9XzH2PzJ45IM6sVGQZ0lv+kdLhab6rk
 ipNbV7G0yLAX+8ygS6GZF7pSKfMzGSGTrRvfdtoiunIip1jCY1IkUxv1XMgBSPza
 wnWYrE5HXEqRUDCqPXJyxrPmx0/0jw8/V82Ng9stnY34ySs+l/3Pvg65Kh0QuSSy
 BRjJUGlOCF68KUBKd+6YB2T5KlbQde3/5lhP+GMOi+xm5sFB+j+59r/WpJpF2Nxs
 ImxQs5GkiU01ErH/rn5FgHY/zzddQenBKwOvrjEeUA1eVpBurdsIr1JN0P6qDbgB
 ho5U8LzCQq+HZiW444eQGkXSOagpUKqDhTVJO7Fji/wG88Atc9gLX3ix8TH2skxT
 C5CvvrJM7DKBtkZyTzotKY/cWorOZhge6E/EkbGaM1sSHdK5b1Rg4YlFi9TDyz0n
 QjGD1uuvEeukeKGdIG9pjc7o5ledbMDYwLpT2RuRXenLOTsn8BqDOo9aRTg+5Kag
 tJNJLFumjPR2mEBNKjicJMUf381J/SKDwZszAz9mgvCZXldMza/Ax0LzJDJCVmkP
 UuBiVzGxVzpd33IsESUDr0J9hc+t8kS10jfAeKnE3cpb6n7/RYxstHh6CHOFKNXM
 gPUSYPN3CYiP47DnSfzA
 =eSW+
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce large directory support
   - introduce f2fs_issue_flush to merge redundant flush commands
   - merge write IOs as much as possible aligned to the segment
   - add sysfs entries to tune the f2fs configuration
   - use radix_tree for the free_nid_list to reduce in-memory operations
   - remove costly bit operations in f2fs_find_entry
   - enhance the readahead flow for CP/NAT/SIT/SSA blocks

  The other bug fixes are as follows:
   - recover xattr node blocks correctly after sudden-power-cut
   - fix to calculate the maximum number of node ids
   - enhance to handle many error cases

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (62 commits)
  f2fs: fix wrong statistics of inline data
  f2fs: check the acl's validity before setting
  f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
  f2fs: fix to cover io->bio with io_rwsem
  f2fs: fix error path when fail to read inline data
  f2fs: use list_for_each_entry{_safe} for simplyfying code
  f2fs: avoid free slab cache under spinlock
  f2fs: avoid unneeded lookup when xattr name length is too long
  f2fs: avoid unnecessary bio submit when wait page writeback
  f2fs: return -EIO when node id is not matched
  f2fs: avoid RECLAIM_FS-ON-W warning
  f2fs: skip unnecessary node writes during fsync
  f2fs: introduce fi->i_sem to protect fi's info
  f2fs: change reclaim rate in percentage
  f2fs: add missing documentation for dir_level
  f2fs: remove unnecessary threshold
  f2fs: throttle the memory footprint with a sysfs entry
  f2fs: avoid to drop nat entries due to the negative nr_shrink
  f2fs: call f2fs_wait_on_page_writeback instead of native function
  f2fs: introduce nr_pages_to_write for segment alignment
  ...
2014-04-07 10:55:36 -07:00
Jaegeuk Kim
6b4afdd794 f2fs: introduce f2fs_issue_flush to avoid redundant flush issue
Some storage devices show relatively high latencies to complete cache_flush
commands, even though their normal IO speed is prettry much high. In such
the case, it needs to merge cache_flush commands as much as possible to avoid
issuing them redundantly.
So, this patch introduces a mount option, "-o flush_merge", to mitigate such
the overhead.

If this option is enabled by user, F2FS merges the cache_flush commands and then
issues just one cache_flush on behalf of them. Once the single command is
finished, F2FS sends a completion signal to all the pending threads.

Note that, this option can be used under a workload consisting of very intensive
concurrent fsync calls, while the storage handles cache_flush commands slowly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-04-07 09:50:58 +09:00
Linus Torvalds
7df934526c Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull renameat2 system call from Miklos Szeredi:
 "This adds a new syscall, renameat2(), which is the same as renameat()
  but with a flags argument.

  The purpose of extending rename is to add cross-rename, a symmetric
  variant of rename, which exchanges the two files.  This allows
  interesting things, which were not possible before, for example
  atomically replacing a directory tree with a symlink, etc...  This
  also allows overlayfs and friends to operate on whiteouts atomically.

  Andy Lutomirski also suggested a "noreplace" flag, which disables the
  overwriting behavior of rename.

  These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only
  implemented for ext4 as an example and for testing"

* 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ext4: add cross rename support
  ext4: rename: split out helper functions
  ext4: rename: move EMLINK check up
  ext4: rename: create ext4_renament structure for local vars
  vfs: add cross-rename
  vfs: lock_two_nondirectories: allow directory args
  security: add flags to rename hooks
  vfs: add RENAME_NOREPLACE flag
  vfs: add renameat2 syscall
  vfs: rename: use common code for dir and non-dir
  vfs: rename: move d_move() up
  vfs: add d_is_dir()
2014-04-04 14:03:05 -07:00
Fabian Frederick
4adeacdf36 Documentation/filesystems/ntfs.txt: remove changelog reference
File was removed in commit 7c821a179f ("Remove fs/ntfs/ChangeLog").

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:27 -07:00
Ryusuke Konishi
7fac376d78 nilfs2: update project's web site in nilfs2.txt
Project's web site was moved to nilfs.sourceforge.net from
www.nilfs.org.  This updates the site information in
Documentation/filesystems/nilfs2.txt with the new location.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:26 -07:00
Andreas Rohner
2cc88f3a5f nilfs2: implementation of NILFS_IOCTL_SET_SUINFO ioctl
With this ioctl the segment usage entries in the SUFILE can be updated
from userspace.

This is useful, because it allows the userspace GC to modify and update
segment usage entries for specific segments, which enables it to avoid
unnecessary write operations.

If a segment needs to be cleaned, but there is no or very little
reclaimable space in it, the cleaning operation basically degrades to a
useless moving operation.  In the end the only thing that changes is the
location of the data and a timestamp in the segment usage information.
With this ioctl the GC can skip the cleaning and update the segment
usage entries directly instead.

This is basically a shortcut to cleaning the segment.  It is still
necessary to read the segment summary information, but the writing of
the live blocks can be skipped if it's not worth it.

[konishi.ryusuke@lab.ntt.co.jp: add description of NILFS_IOCTL_SET_SUINFO ioctl]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:25 -07:00
Johannes Weiner
91b0abe36a mm + fs: store shadow entries in page cache
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page.  As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently.  At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.

Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate.  Reclaim will check
for this flag before installing shadow pages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:01 -07:00
Al Viro
c186afb4db switch ->is_partially_uptodate() to saner arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-04-01 23:19:19 -04:00
Miklos Szeredi
520c8b1650 vfs: add renameat2 syscall
Add new renameat2 syscall, which is the same as renameat with an added
flags argument.

Pass flags to vfs_rename() and to i_op->rename() as well.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
2014-04-01 17:08:42 +02:00
Masanari Iida
df5cbb2783 doc: fix double words
Fix double words "the the" in various files
within Documentations.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-03-21 13:16:58 +01:00
Jaegeuk Kim
58c410351e f2fs: change reclaim rate in percentage
It is more reasonable to determine the reclaiming rate of prefree segments
according to the volume size, which is set to 5% by default.
For example, if the volume is 128GB, the prefree segments are reclaimed
when the number reaches to 6.4GB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:10 +09:00
Jaegeuk Kim
cdfc41c134 f2fs: throttle the memory footprint with a sysfs entry
This patch introduces ram_thresh, a sysfs entry, which controls the memory
footprint used by the free nid list and the nat cache.

Previously, the free nid list was controlled by MAX_FREE_NIDS, while the nat
cache was managed by NM_WOUT_THRESHOLD.
However, this approach cannot be applied dynamically according to the system.

So, this patch adds ram_thresh that users can specify the threshold, which is
in order of 1 / 1024.
For example, if the total ram size is 4GB and the value is set to 10 by default,
f2fs tries to control the number of free nids and nat caches not to consume over
10 * (4GB / 1024) = 10MB.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-03-20 22:10:09 +09:00
Jaegeuk Kim
ab9fa662e4 f2fs: add an sysfs entry to control the directory level
This patch adds an sysfs entry to control dir_level used by the large directory.

The description of this entry is:

 dir_level                    This parameter controls the directory level to
			      support large directory. If a directory has a
			      number of files, it can reduce the file lookup
			      latency by increasing this dir_level value.
			      Otherwise, it needs to decrease this value to
			      reduce the space overhead. The default value is 0.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 20:31:15 +09:00
Jaegeuk Kim
3843154598 f2fs: introduce large directory support
This patch introduces an i_dir_level field to support large directory.

Previously, f2fs maintains multi-level hash tables to find a dentry quickly
from a bunch of chiild dentries in a directory, and the hash tables consist of
the following tree structure as below.

In Documentation/filesystems/f2fs.txt,

----------------------
A : bucket
B : block
N : MAX_DIR_HASH_DEPTH
----------------------

level #0   | A(2B)
           |
level #1   | A(2B) - A(2B)
           |
level #2   | A(2B) - A(2B) - A(2B) - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

But, if we can guess that a directory will handle a number of child files,
we don't need to traverse the tree from level #0 to #N all the time.
Since the lower level tables contain relatively small number of dentries,
the miss ratio of the target dentry is likely to be high.

In order to avoid that, we can configure the hash tables sparsely from level #0
like this.

level #0   | A(2B) - A(2B) - A(2B) - A(2B)

level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
     .     |   .       .       .       .
level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)

With this structure, we can skip the ineffective tree searches in lower level
hash tables.

This patch adds just a facility for this by introducing i_dir_level in
f2fs_inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-02-27 19:56:09 +09:00
Jiri Kosina
d4263348f7 Merge branch 'master' into for-next 2014-02-20 14:54:28 +01:00
Olaf Hering
be873ac782 Documentation: update URL to hfsplus Technote 1150
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-02-20 14:48:51 +01:00
Henrik Austad
3cf8ca1c25 Documentation/: update 00-INDEX files
Some of the 00-INDEX files are somewhat outdated and some folders does
not contain 00-INDEX at all.  Only outdated (with the notably exception
of spi) indexes are touched here, the 169 folders without 00-INDEX has
not been touched.

New 00-INDEX
 - spi/* was added in a series of commits dating back to 2006

Added files (missing in (*/)00-INDEX)
 - dmatest.txt was added by commit 851b7e16a0 ("dmatest: run test via
   debugfs")
 - this_cpu_ops.txt was added by commit a1b2a555d6 ("percpu: add
   documentation on this_cpu operations")
 - ww-mutex-design.txt was added by commit 040a0a3710 ("mutex: Add
   support for wound/wait style locks")
 - bcache.txt was added by commit cafe563591 ("bcache: A block layer
   cache")
 - kernel-per-CPU-kthreads.txt was added by commit 49717cb404
   ("kthread: Document ways of reducing OS jitter due to per-CPU
   kthreads")
 - phy.txt was added by commit ff76496347 ("drivers: phy: add generic
   PHY framework")
 - block/null_blk was added by commit 12f8f4fc03 ("null_blk:
   documentation")
 - module-signing.txt was added by commit 3cafea3076 ("Add
   Documentation/module-signing.txt file")
 - assoc_array.txt was added by commit 3cb989501c ("Add a generic
   associative array implementation.")
 - arm/IXP4xx was part of the initial repo
 - arm/cluster-pm-race-avoidance.txt was added by commit 7fe31d28e8
   ("ARM: mcpm: introduce helpers for platform coherency exit/setup")
 - arm/firmware.txt was added by commit 7366b92a77 ("ARM: Add
   interface for registering and calling firmware-specific operations")
 - arm/kernel_mode_neon.txt was added by commit 2afd0a0524 ("ARM:
   7825/1: document the use of NEON in kernel mode")
 - arm/tcm.txt was added by commit bc581770cf ("ARM: 5580/2: ARM TCM
   (Tightly-Coupled Memory) support v3")
 - arm/vlocks.txt was added by commit 9762f12d3e ("ARM: mcpm: Add
   baremetal voting mutexes")
 - blackfin/gptimers-example.c, Makefile was added by commit
   4b60779d5e ("Blackfin: add an example showing how to use the
   gptimers API")
 - devicetree/usage-model.txt was added by commit 31134efc68 ("dt:
   Linux DT usage model documentation")
 - fb/api.txt was added by commit fb21c2f428 ("fbdev: Add FOURCC-based
   format configuration API")
 - fb/sm501.txt was added by commit e6a0498071 ("video, sm501: add
   edid and commandline support")
 - fb/udlfb.txt was added by commit 96f8d864af ("fbdev: move udlfb out
   of staging.")
 - filesystems/Makefile was added by commit 1e0051ae48
   ("Documentation/fs/: split txt and source files")
 - filesystems/nfs/nfsd-admin-interfaces.txt was added by commit
   8a4c6e19cf ("nfsd: document kernel interfaces for nfsd
   configuration")
 - ide/warm-plug-howto.txt was added by commit f74c91413e ("ide: add
   warm-plug support for IDE devices (take 2)")
 - laptops/Makefile was added by commit d49129accc
   ("Documentation/laptop/: split txt and source files")
 - leds/leds-blinkm.txt was added by commit b54cf35a7f ("LEDS: add
   BlinkM RGB LED driver, documentation and update MAINTAINERS")
 - leds/ledtrig-oneshot.txt was added by commit 5e417281cd ("leds: add
   oneshot trigger")
 - leds/ledtrig-transient.txt was added by commit 44e1e9f8e7 ("leds:
   add new transient trigger for one shot timer activation")
 - m68k/README.buddha was part of the initial repo
 - networking/LICENSE.(qla3xxx|qlcnic|qlge) was added by commits
   40839129f7, c4e84bde1d, 5a4faa8737
 - networking/Makefile was added by commit 3794f3e812 ("docsrc: build
   Documentation/ sources")
 - networking/i40evf.txt was added by commit 105bf2fe6b ("i40evf: add
   driver to kernel build system")
 - networking/ipsec.txt was added by commit b3c6efbc36 ("xfrm: Add
   file to document IPsec corner case")
 - networking/mac80211-auth-assoc-deauth.txt was added by commit
   3cd7920a2b ("mac80211: add auth/assoc/deauth flow diagram")
 - networking/netlink_mmap.txt was added by commit 5683264c39
   ("netlink: add documentation for memory mapped I/O")
 - networking/nf_conntrack-sysctl.txt was added by commit c9f9e0e159
   ("netfilter: doc: add nf_conntrack sysctl api documentation") lan)
 - networking/team.txt was added by commit 3d249d4ca7 ("net: introduce
   ethernet teaming device")
 - networking/vxlan.txt was added by commit d342894c5d ("vxlan:
   virtual extensible lan")
 - power/runtime_pm.txt was added by commit 5e928f77a0 ("PM: Introduce
   core framework for run-time PM of I/O devices (rev.  17)")
 - power/charger-manager.txt was added by commit 3bb3dbbd56
   ("power_supply: Add initial Charger-Manager driver")
 - RCU/lockdep-splat.txt was added by commit d7bd2d68aa ("rcu:
   Document interpretation of RCU-lockdep splats")
 - s390/kvm.txt was added by 5ecee4b (KVM: s390: API documentation)
 - s390/qeth.txt was added by commit b4d72c08b3 ("qeth: bridgeport
   support - basic control")
 - scheduler/sched-bwc.txt was added by commit 88ebc08ea9 ("sched: Add
   documentation for bandwidth control")
 - scsi/advansys.txt was added by commit 4bd6d7f356 ("[SCSI] advansys:
   Move documentation to Documentation/scsi")
 - scsi/bfa.txt was added by commit 1ec90174bd ("[SCSI] bfa: add
   readme file")
 - scsi/bnx2fc.txt was added by commit 12b8fc10ea ("[SCSI] bnx2fc: Add
   driver documentation")
 - scsi/cxgb3i.txt was added by commit c3673464eb ("[SCSI] cxgb3i: Add
   cxgb3i iSCSI driver.")
 - scsi/hpsa.txt was added by commit 992ebcf14f ("[SCSI] hpsa: Add
   hpsa.txt to Documentation/scsi")
 - scsi/link_power_management_policy.txt was added by commit
   ca77329fb7 ("[libata] Link power management infrastructure")
 - scsi/osd.txt was added by commit 78e0c621de ("[SCSI] osd:
   Documentation for OSD library")
 - scsi/scsi-parameter.txt was created/moved by commit 163475fb11
   ("Documentation: move SCSI parameters to their own text file")
 - serial/driver was part of the initial repo
 - serial/n_gsm.txt was added by commit 323e84122e ("n_gsm: add a
   documentation")
 - timers/Makefile was added by commit 3794f3e812 ("docsrc: build
   Documentation/ sources")
 - virt/kvm/s390.txt was added by commit d9101fca3d ("KVM: s390:
   diagnose call documentation")
 - vm/split_page_table_lock was added by commit 49076ec2cc ("mm:
   dynamically allocate page->ptl if it cannot be embedded to struct
   page")
 - w1/slaves/w1_ds28e04 was added by commit fbf7f7b4e2 ("w1: Add
   1-wire slave device driver for DS28E04-100")
 - w1/masters/omap-hdq was added by commit e0a29382c6 ("hdq:
   documentation for OMAP HDQ")
 - x86/early-microcode.txt was added by commit 0d91ea86a8 ("x86, doc:
   Documentation for early microcode loading")
 - x86/earlyprintk.txt was added by commit a1aade4788 ("x86/doc:
   mini-howto for using earlyprintk=dbgp")
 - x86/entry_64.txt was added by commit 8b4777a4b5 ("x86-64: Document
   some of entry_64.S")
 - x86/pat.txt was added by commit d27554d874 ("x86: PAT
   documentation")

Moved files
 - arm/kernel_user_helpers.txt was moved out of arch/arm/kernel by
   commit 37b8304642 ("ARM: kuser: move interface documentation out of
   the source code")
 - efi-stub.txt was moved out of x86/ and down into Documentation/ in
   commit 4172fe2f8a ("EFI stub documentation updates")
 - laptops/hpfall.c was moved out of hwmon/ and into laptops/ in commit
   efcfed9bad ("Move hp_accel to drivers/platform/x86")
 - commit 5616c23ad9 ("x86: doc: move x86-generic documentation from
   Doc/x86/i386"):
   * x86/usb-legacy-support.txt
   * x86/boot.txt
   * x86/zero_page.txt
 - power/video_extension.txt was moved to acpi in commit 70e66e4df1
   ("ACPI / video: move video_extension.txt to Documentation/acpi")

Removed files (left in 00-INDEX)
 - memory.txt was removed by commit 00ea8990aa ("memory.txt: remove
   stray information")
 - gpio.txt was moved to gpio/ in commit fd8e198cfc ("Documentation:
   gpiolib: document new interface")
 - networking/DLINK.txt was removed by commit 168e06ae26
   ("drivers/net: delete old parallel port de600/de620 drivers")
 - serial/hayes-esp.txt was removed by commit f53a2ade0b ("tty: esp:
   remove broken driver")
 - s390/TAPE was removed by commit 9e280f6693 ("[S390] remove tape
   block docu")
 - vm/locking was removed by commit 57ea8171d2 ("mm: documentation:
   remove hopelessly out-of-date locking doc")
 - laptops/acer-wmi.txt was remvoed by commit 020036678e ("acer-wmi:
   Delete out-of-date documentation")

Typos/misc issues
 - rpc-server-gss.txt was added as knfsd-rpcgss.txt in commit
   030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
   authentication.")
 - commit b88cf73d92 ("net: add missing entries to
   Documentation/networking/00-INDEX")
   * generic-hdlc.txt was added as generic_hdlc.txt
   * spider_net.txt was added as spider-net.txt
 - w1/master/mxc-w1 was added as mxc_w1 by commit a5fd9139f7 ("w1: add
   1-wire master driver for i.MX27 / i.MX31")
 - s390/zfcpdump.txt was added as zfcpdump by commit 6920c12a40
   ("[S390] Add Documentation/s390/00-INDEX.")

Signed-off-by: Henrik Austad <henrik@austad.us>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	[rcu bits]
Acked-by: Rob Landley <rob@landley.net>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mark Brown <broonie@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: James Bottomley <JBottomley@parallels.com>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:40 -08:00
Linus Torvalds
e7651b819e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a pretty big pull, and most of these changes have been
  floating in btrfs-next for a long time.  Filipe's properties work is a
  cool building block for inheriting attributes like compression down on
  a per inode basis.

  Jeff Mahoney kicked in code to export filesystem info into sysfs.

  Otherwise, lots of performance improvements, cleanups and bug fixes.

  Looks like there are still a few other small pending incrementals, but
  I wanted to get the bulk of this in first"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
  Btrfs: fix spin_unlock in check_ref_cleanup
  Btrfs: setup inode location during btrfs_init_inode_locked
  Btrfs: don't use ram_bytes for uncompressed inline items
  Btrfs: fix btrfs_search_slot_for_read backwards iteration
  Btrfs: do not export ulist functions
  Btrfs: rework ulist with list+rb_tree
  Btrfs: fix memory leaks on walking backrefs failure
  Btrfs: fix send file hole detection leading to data corruption
  Btrfs: add a reschedule point in btrfs_find_all_roots()
  Btrfs: make send's file extent item search more efficient
  Btrfs: fix to catch all errors when resolving indirect ref
  Btrfs: fix protection between walking backrefs and root deletion
  btrfs: fix warning while merging two adjacent extents
  Btrfs: fix infinite path build loops in incremental send
  btrfs: undo sysfs when open_ctree() fails
  Btrfs: fix snprintf usage by send's gen_unique_name
  btrfs: fix defrag 32-bit integer overflow
  btrfs: sysfs: list the NO_HOLES feature
  btrfs: sysfs: don't show reserved incompat feature
  btrfs: call permission checks earlier in ioctls and return EPERM
  ...
2014-01-30 20:08:20 -08:00
Richard Yao
46bf16c44b Documentation/filesystems/vfs.txt: update file_operations documentation
->readv, ->writev and ->sendfile have been removed while ->show_fdinfo
has been added. The documentation should reflect this.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:56 -08:00
David Rientjes
778c14affa mm, oom: base root bonus on current usage
A 3% of system memory bonus is sometimes too excessive in comparison to
other processes.

With commit a63d83f427 ("oom: badness heuristic rewrite"), the OOM
killer tries to avoid killing privileged tasks by subtracting 3% of
overall memory (system or cgroup) from their per-task consumption.  But
as a result, all root tasks that consume less than 3% of overall memory
are considered equal, and so it only takes 33+ privileged tasks pushing
the system out of memory for the OOM killer to do something stupid and
kill dhclient or other root-owned processes.  For example, on a 32G
machine it can't tell the difference between the 1M agetty and the 10G
fork bomb member.

The changelog describes this 3% boost as the equivalent to the global
overcommit limit being 3% higher for privileged tasks, but this is not
the same as discounting 3% of overall memory from _every privileged task
individually_ during OOM selection.

Replace the 3% of system memory bonus with a 3% of current memory usage
bonus.

By giving root tasks a bonus that is proportional to their actual size,
they remain comparable even when relatively small.  In the example
above, the OOM killer will discount the 1M agetty's 256 badness points
down to 179, and the 10G fork bomb's 262144 points down to 183500 points
and make the right choice, instead of discounting both to 0 and killing
agetty because it's first in the task list.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30 16:56:56 -08:00
Linus Torvalds
d9894c228b Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 - Handle some loose ends from the vfs read delegation support.
   (For example nfsd can stop breaking leases on its own in a
    fewer places where it can now depend on the vfs to.)
 - Make life a little easier for NFSv4-only configurations
   (thanks to Kinglong Mee).
 - Fix some gss-proxy problems (thanks Jeff Layton).
 - miscellaneous bug fixes and cleanup

* 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits)
  nfsd: consider CLAIM_FH when handing out delegation
  nfsd4: fix delegation-unlink/rename race
  nfsd4: delay setting current_fh in open
  nfsd4: minor nfs4_setlease cleanup
  gss_krb5: use lcm from kernel lib
  nfsd4: decrease nfsd4_encode_fattr stack usage
  nfsd: fix encode_entryplus_baggage stack usage
  nfsd4: simplify xdr encoding of nfsv4 names
  nfsd4: encode_rdattr_error cleanup
  nfsd4: nfsd4_encode_fattr cleanup
  minor svcauth_gss.c cleanup
  nfsd4: better VERIFY comment
  nfsd4: break only delegations when appropriate
  NFSD: Fix a memory leak in nfsd4_create_session
  sunrpc: get rid of use_gssp_lock
  sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
  sunrpc: don't wait for write before allowing reads from use-gss-proxy file
  nfsd: get rid of unused function definition
  Define op_iattr for nfsd4_open instead using macro
  NFSD: fix compile warning without CONFIG_NFSD_V3
  ...
2014-01-30 10:18:43 -08:00
Qu Wenruo
a88998f291 btrfs: Add treelog mount option.
Add treelog mount option to enable tree log with
remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:20 -08:00
Qu Wenruo
d399167d88 btrfs: Add datasum mount option.
Add datasum mount option to enable checksum with
remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:20 -08:00
Qu Wenruo
a258af7a3e btrfs: Add datacow mount option.
Add datacow mount option to enable copy-on-write with
remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:19 -08:00
Qu Wenruo
bd0330ad21 btrfs: Add acl mount option.
Add acl mount option to enable acl with remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:19 -08:00
Qu Wenruo
2c9ee85671 btrfs: Add noflushoncommit mount option.
Add noflushoncommit mount option to disable flush on commit with
remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:18 -08:00
Qu Wenruo
5303629343 btrfs: Add noenospc_debug mount option.
Add noenospc_debug mount option to disable ENOSPC debug with
remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:17 -08:00
Qu Wenruo
e07a2ade44 btrfs: Add nodiscard mount option.
Add nodiscard mount option to disable discard with remount option.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:17 -08:00
Qu Wenruo
fc0ca9af18 btrfs: Add noautodefrag mount option.
Btrfs has autodefrag mount option but no pairing noautodefrag option,
which makes it impossible to disable autodefrag without umount.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:16 -08:00
Qu Wenruo
842bef5891 btrfs: Add "barrier" option to support "-o remount,barrier"
Btrfs can be remounted without barrier, but there is no "barrier" option
so nobody can remount btrfs back with barrier on. Only umount and
mount again can re-enable barrier.(Quite awkward)

Also the mount options in the document is also changed slightly for the
further pairing options changes.

Reported-by: Daniel Blueman <daniel@quora.org>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Mike Fleetwood <mike.fleetwood@googlemail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:16 -08:00
Linus Torvalds
1c2948380b 9p changes for 3.14 merge window
Included are a new cache model for support of mmap,
 and several cleanups across the filesystem and networking
 portions of the code.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJS4qT3AAoJEDZk62b0Tg6xiF0P/RI2c75f/5SDbeNtH0usbgRj
 YfC7do4NX0HX0nI8H0bvfai1UU9JLc8M7aEjf5nw19O45phOQcGm/KeGRmMKtAhr
 OixTLXaMkRd3llTGkFv8ZY0W6aaSsGpB3Lzin+lZBwzYqMcqksBqhOTOoS1MSh3F
 u5PyhpuyJmxMkS5ud857PfwIREXOSHF/NIMMs5k9M9kK0zCka7xvl4Kg8zng2RVf
 A3rmKsLEvYuAgnxOq16hsRMgqHwx2833C3VmQKSl/n6SfOCy7cAMNmChIDrnAwtF
 dJosxypiRSkYjgD/YJR3UZofF7IqPgdL4umNmnb2lTHbOpeqNQ1hLB8BotjGpoVX
 pl9lxzz8UzaflwkAdgPsy/GBrbULxQKPLhL1Y0QPedhYh57bqRUEPPJ/HOjyrbOE
 RZXKZXfKbYlbNwc61N+meRC0IJETTjafnJlEzXu2vA+3LxZ3n/uZ7uq7XasVPiUV
 UKTKcvzYMs/PxA47rX81DOzebmphGEZDzw2ONbi4LMwGqeWt6WIpCMLPdGDjq7kl
 jdkpf9DuDr4mDrVP5+cFhzGQYbv9rCGR1zakWSW2H9xqP4Zy+o3kEPstniTMuNS4
 smkLPfpcG0VAKvY3HiVxT62EA4M+38IBAME0ATicE6esrWDyuLtGlke7x+uZoLUF
 mQ7WPimYBR+60liZ3zbQ
 =tCej
 -----END PGP SIGNATURE-----

Merge tag 'for-3.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p changes from Eric Van Hensbergen:
 "Included are a new cache model for support of mmap, and several
  cleanups across the filesystem and networking portions of the code"

* tag 'for-3.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation
  9P: introduction of a new cache=mmap model.
  net/9p: remove virtio default hack and set appropriate bits instead
  9p: remove useless 'name' variable and assignment
  9p: fix return value in case in v9fs_fid_xattr_set()
  9p: remove useless variable and assignment
  9p: remove useless assignment
  9p: remove unused 'super_block' struct pointer
  9p: remove never used return variable
  9p: remove unused 'p9_fid' struct pointer
  9p: remove unused 'p9_client' struct pointer
2014-01-26 10:55:41 -08:00
Eric Van Hensbergen
b871866e4a 9p: update documentation
quick pass to update the documentation to include instructions for
the new cache=mmap mode as well as clean up some out-of-date bits.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-01-24 10:55:21 -06:00
Fabian Frederick
50114c1103 Documentation/filesystems/00-INDEX: updates
Add the following documentation-files with description :
 -autofs4-mount-control.txt
 -btrfs.txt
 -debugfs.txt
 -devpts.txt
 -fiemap.txt
 -gfs2-glocks.txt
 -gfs2-uevents.txt
 -omfs.txt
 -path-lookup.txt
 -qnx6.txt
 -quota.txt
 -squashfs.txt
 -sysfs-tagging.txt
 -ubifs.txt
 -xfs-delayed-logging-design.txt
 -xfs-self-describing-metadata.txt

Add the following documentation directories with description :
 -caching
 -cifs (replacing cifs.txt)
 -pohmelfs

Remove the following documentation-files reference:
 -dentry-locking.txt
 -reiser4.txt

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:01 -08:00
Andre Richter
c108373290 Documentation/filesystems/sysfs.txt: fix device_attribute declaration
Fix a wrong device_attribute declaration example.

Signed-off-by: Andre Richter <andre.o.richter@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:00 -08:00
Vyacheslav Dubeyko
d623a9420c nilfs2: add comments for ioctls
Add comments for ioctls in fs/nilfs2/ioctl.c file and describe NILFS2
specific ioctls in Documentation/filesystems/nilfs2.txt.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Wenliang Fan <fanwlexca@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:37:00 -08:00
Linus Torvalds
0d90d63872 f2fs updates for v3.14
This patch-set includes the following major enhancement patches.
 o support inline_data
 o refactor bio operations such as merge operations and rw type assignment
 o enhance the direct IO path
 o enhance bio operations
 o truncate a node page when it becomes obsolete
 o add sysfs entries: small_discards, max_victim_search, and in-place-update
 o add a sysfs entry to control max_victim_search
 
 The other bug fixes are as follows.
 o fix a bug in truncate_partial_nodes
 o avoid warnings during sparse and build process
 o fix error handling flows
 o fix potential bit overflows
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS4HQfAAoJEEAUqH6CSFDSyyMP/iUXSMC9yw6eOmSjAh3boc6+
 C7e4zrhdovekGTuZgg41SLdr83cpbEohv11wcXAfxB+eYFEz0zrAVzt54zMi7uOL
 9JmFJ6XVL/T3omI5hpEwWHg6S6tOynN6mcjacsrvypEekgjHbbpLudSw6SCu3dKz
 Lpc3z6CxrWbhvX8Iyf1j8mCceWkTO6eRv7u2H4Njtsq4Tukw3BHiBsURXt6kGwpx
 CvRBgCFdQhv4GAtbDosmVjNWOUxvik7w2epHAPQGddFTgaCL9uS+gfweHK6H9EDp
 1e3BDhmn5r9IhiLY8KVXRc8+po9kQeO1jNQATBuWggfjJSGbEBmrEQX4MFE3uCi9
 q84hGV9+yaJxoT2A21qIeWgorF9gjqNbnrrENKHyKhOqXJSrh48u5LUV8KqIyz1Y
 Qw62cypEB+PQxWegN76vwX/OrHMCLYMQ6c78bYLSwkBKonOrF5sN2+kJW5+zEj6n
 q2cYi1PLMJe7LTcULUrxJTSPFLKM5yA2oYZq3LN4sUYBeN6USaouaIqcZBqRBTCO
 adqlTa3sWytkDMAHsTpwrHABKK7pwiZoPLDVwjo0TIJ6Us4JhDtTktp5pj24fQ7Y
 6lC9w4VbfAKtq8fMV17rZYD0lQFlmZk4uQRJ8XYicCRFx11kMPKYzdGmP5aVXWru
 wxcztktnABtCAXK0PFLf
 =gVDh
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, a couple of sysfs entries were introduced to tune the
  f2fs at runtime.

  In addition, f2fs starts to support inline_data and improves the
  read/write performance in some workloads by refactoring bio-related
  flows.

  This patch-set includes the following major enhancement patches.
   - support inline_data
   - refactor bio operations such as merge operations and rw type
     assignment
   - enhance the direct IO path
   - enhance bio operations
   - truncate a node page when it becomes obsolete
   - add sysfs entries: small_discards, max_victim_search, and
     in-place-update
   - add a sysfs entry to control max_victim_search

  The other bug fixes are as follows.
   - fix a bug in truncate_partial_nodes
   - avoid warnings during sparse and build process
   - fix error handling flows
   - fix potential bit overflows

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (95 commits)
  f2fs: drop obsolete node page when it is truncated
  f2fs: introduce NODE_MAPPING for code consistency
  f2fs: remove the orphan block page array
  f2fs: add help function META_MAPPING
  f2fs: move a branch for code redability
  f2fs: call mark_inode_dirty to flush dirty pages
  f2fs: clean checkpatch warnings
  f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH)
  f2fs: avoid f2fs_balance_fs call during pageout
  f2fs: add delimiter to seperate name and value in debug phrase
  f2fs: use spinlock rather than mutex for better speed
  f2fs: move alloc new orphan node out of lock protection region
  f2fs: move grabing orphan pages out of protection region
  f2fs: remove the needless parameter of f2fs_wait_on_page_writeback
  f2fs: update documents and a MAINTAINERS entry
  f2fs: add a sysfs entry to control max_victim_search
  f2fs: improve write performance under frequent fsync calls
  f2fs: avoid to read inline data except first page
  f2fs: avoid to left uninitialized data in page when read inline data
  f2fs: fix truncate_partial_nodes bug
  ...
2014-01-23 09:21:09 -08:00
Linus Torvalds
bb1281f2aa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual rocket science stuff from trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  neighbour.h: fix comment
  sched: Fix warning on make htmldocs caused by wait.h
  slab: struct kmem_cache is protected by slab_mutex
  doc: Fix typo in USB Gadget Documentation
  of/Kconfig: Spelling s/one/once/
  mkregtable: Fix sscanf handling
  lp5523, lp8501: comment improvements
  thermal: rcar: comment spelling
  treewide: fix comments and printk msgs
  IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
  Documentation: update /proc/uptime field description
  Documentation: Fix size parameter for snprintf
  arm: fix comment header and macro name
  asm-generic: uaccess: Spelling s/a ny/any/
  mtd: onenand: fix comment header
  doc: driver-model/platform.txt: fix a typo
  drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
  doc: Fix typo (acces_process_vm -> access_process_vm)
  treewide: Fix typos in printk
  drivers/gpu/drm/qxl/Kconfig: reformat the help text
  ...
2014-01-22 21:21:55 -08:00
Rik van Riel
34e431b0ae /proc/meminfo: provide estimated available memory
Many load balancing and workload placing programs check /proc/meminfo to
estimate how much free memory is available.  They generally do this by
adding up "free" and "cached", which was fine ten years ago, but is
pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as page
cache, for example shared memory segments, tmpfs, and ramfs, and it does
not include reclaimable slab memory, which can take up a large fraction
of system memory on mostly idle systems with lots of files.

Currently, the amount of memory that is available for a new workload,
without pushing the system into swap, can be estimated from MemFree,
Active(file), Inactive(file), and SReclaimable, as well as the "low"
watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should not
be expected to know kernel internals to come up with an estimate for the
amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo.  If
things change in the future, we only have to change it in one place.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Erik Mouw <erik.mouw_2@nxp.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:43 -08:00
Jaegeuk Kim
3bac380c90 f2fs: update documents and a MAINTAINERS entry
This patch adds missing some description of sysfs entries in
 - Documentation/ABI/testing/sysfs-fs-f2fs
 - Documentation/filesystems/f2fs.txt.

And it adds a maintained document entry of F2FS in MAINTAINERS.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2014-01-09 21:00:06 +09:00
Rob Landley
49457896d5 Documentation: update /proc/uptime field description
/proc/uptime has two fields, update description to describe both
ala http://lkml.indiana.edu/hypermail/linux/kernel/1312.3/01454.html

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-01-02 10:53:04 +01:00
Huajun Li
e4024e86c2 f2fs: update f2fs Documentation
This patch describes the inline_data support in f2fs document.

Signed-off-by: Huajun Li <huajun.li@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-26 20:40:44 +09:00
Jaegeuk Kim
ba0697ec98 f2fs: add description about small_discards in document
This patch adds a description about small_disacrds in the f2fs document.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Jaegeuk Kim
216fbd6443 f2fs: introduce sysfs entry to control in-place-update policy
This patch introduces new sysfs entries for users to control the policy of
in-place-updates, namely IPU, in f2fs.

Sometimes f2fs suffers from performance degradation due to its out-of-place
update policy that produces many additional node block writes.
If the storage performance is very dependant on the amount of data writes
instead of IO patterns, we'd better drop this out-of-place update policy.

This patch suggests 5 polcies and their triggering conditions as follows.

[sysfs entry name = ipu_policy]

0: F2FS_IPU_FORCE       all the time,
1: F2FS_IPU_SSR         if SSR mode is activated,
2: F2FS_IPU_UTIL        if FS utilization is over threashold,
3: F2FS_IPU_SSR_UTIL    if SSR mode is activated and FS utilization is over
                        threashold,
4: F2FS_IPU_DISABLE    disable IPU. (=default option)

[sysfs entry name = min_ipu_util]

This parameter controls the threshold to trigger in-place-updates.
The number indicates percentage of the filesystem utilization, and used by
F2FS_IPU_UTIL and F2FS_IPU_SSR_UTIL policies.

For more details, see need_inplace_update() in segment.h.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-12-23 10:18:07 +09:00
Stefan Weil
507da6a1f3 doc: Fix typo (acces_process_vm -> access_process_vm)
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-12-19 15:12:21 +01:00
J. Bruce Fields
4bd8eabc29 nfsd4: update 4.1 nfsd status documentation
This has gone a little stale.

Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-10 20:35:57 -05:00
Linus Torvalds
fb0d1eb892 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Almost all of these are bug fixes.  Dave Sterba's documentation update
  is the big exception because he removed our promises to set any
  machine running Btrfs on fire"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Documentation: filesystems: update btrfs tools section
  Documentation: filesystems: add new btrfs mount options
  btrfs: update kconfig help text
  btrfs: fix bio_size_ok() for max_sectors > 0xffff
  btrfs: Use trace condition for get_extent tracepoint
  btrfs: fix typo in the log message
  Btrfs: fix list delete warning when removing ordered root from the list
  Btrfs: print bytenr instead of page pointer in check-int
  Btrfs: remove dead codes from ctree.h
  Btrfs: don't wait for ordered data outside desired range
  Btrfs: fix lockdep error in async commit
  Btrfs: avoid heavy operations in btrfs_commit_super
  Btrfs: fix __btrfs_start_workers retval
  Btrfs: disable online raid-repair on ro mounts
  Btrfs: do not inc uncorrectable_errors counter on ro scrubs
  Btrfs: only drop modified extents if we logged the whole inode
  Btrfs: make sure to copy everything if we rename
  Btrfs: don't BUG_ON() if we get an error walking backrefs
2013-11-22 08:38:55 -08:00
David Sterba
c75017961b Documentation: filesystems: update btrfs tools section
The tools mentioned have been obsoleted long ago, replace
with the current ones.

CC: linux-doc@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21 11:51:50 -05:00
David Sterba
906c176e54 Documentation: filesystems: add new btrfs mount options
Two new options were added in 3.12: commit and rescan_uuid_tree

CC: linux-doc@vger.kernel.org
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-21 11:51:49 -05:00
Linus Torvalds
5cbb3d216e Merge branch 'akpm' (patches from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 "Quite a lot of other stuff is banked up awaiting further
  next->mainline merging, but this batch contains:

   - Lots of random misc patches
   - OCFS2
   - Most of MM
   - backlight updates
   - lib/ updates
   - printk updates
   - checkpatch updates
   - epoll tweaking
   - rtc updates
   - hfs
   - hfsplus
   - documentation
   - procfs
   - update gcov to gcc-4.7 format
   - IPC"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
  ipc, msg: fix message length check for negative values
  ipc/util.c: remove unnecessary work pending test
  devpts: plug the memory leak in kill_sb
  ./Makefile: export initial ramdisk compression config option
  init/Kconfig: add option to disable kernel compression
  drivers: w1: make w1_slave::flags long to avoid memory corruption
  drivers/w1/masters/ds1wm.cuse dev_get_platdata()
  drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
  drivers/memstick/core/mspro_block.c: fix attributes array allocation
  drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
  kernel/panic.c: reduce 1 byte usage for print tainted buffer
  gcov: reuse kbasename helper
  kernel/gcov/fs.c: use pr_warn()
  kernel/module.c: use pr_foo()
  gcov: compile specific gcov implementation based on gcc version
  gcov: add support for gcc 4.7 gcov format
  gcov: move gcov structs definitions to a gcc version specific file
  kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
  kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
  kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
  ...
2013-11-13 15:45:43 +09:00
Linus Torvalds
9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Linus Torvalds
dd1d1399f2 f2fs updates for v3.13
This patch-set includes the following major enhancement patches.
 o add a sysfs to control reclaiming free segments
 o enhance the f2fs global lock procedures
 o enhance the victim selection flow
 o wait for selected node blocks during fsync
 o add some tracepoints
 o add a config to remove abundant BUG_ONs
 
 The other bug fixes are as follows.
 o fix deadlock on acl operations
 o fix some bugs with respect to orphan inodes
 
 And, there are a bunch of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSgMZ/AAoJEEAUqH6CSFDSsL4P/Ri6GZyy5F0DGjAJElX825gO
 gthRZ1uq1OAXUYDOEy150CsFgIiWeu2MxiOV15UnmX893a5cXrf32afoa/Cqx8GG
 FVEYc5+dDdogezQCW6XSatQ4s7cDQDymyT2Mky4MJyAxhpYtvbpcyVI/OWdVcTwh
 pqdJfsfuOikbOOL6VU2B5dDKwjc+6lgntdv/eICzNCH9NqHv8kxmm+h3NfaqUVrW
 pK1irqsXrktcwLIrOH0c5ZpPcKPghJuw37oFpEw8MxYbTnpdrbLq4BKE/fRh8Fhf
 R+sQgEoWZriE1SISHmYjWdt87hnFCk3wysl61Z/zkOxnYKebRBrjEiudzxAHDIGY
 +I71ovpVCWe0uljdiTBpLQ/iN4p2fRMLjn7j1IsMzoG9yfVFduMaY70m1AOZI/7z
 03QRpkmiRi7F8GYTSlPefsUUAnMYVDO6DzsyfHdxa8v+4UvWhSE4380L9DttNbCr
 2/+NGRZ4kga6GSsMhdn2Bnm6i3TkMDJosu4USkv4qGR1SH1+S5dodwxfQdonPUZg
 380kPkV7/gBYaMBSdrQFds3lh7g431gfYEfGSWt3vA14fFIWP7nIFpVIPGMM6/Sd
 GFe6gqZ2JLatqJnQNwEjPsBPPsiCAt6exbg86fTCvrS+oyQTiv44FNOWbz7iTrxw
 5nZQfQHSMhKtux7rpM/N
 =Grs1
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - add a sysfs to control reclaiming free segments
   - enhance the f2fs global lock procedures
   - enhance the victim selection flow
   - wait for selected node blocks during fsync
   - add some tracepoints
   - add a config to remove abundant BUG_ONs

  The other bug fixes are as follows.
   - fix deadlock on acl operations
   - fix some bugs with respect to orphan inodes

  And, there are a bunch of cleanups"

* tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
  f2fs: issue more large discard command
  f2fs: fix memory leak after kobject init failed in fill_super
  f2fs: cleanup waiting routine for writeback pages in cp
  f2fs: avoid to use a NULL point in destroy_segment_manager
  f2fs: remove unnecessary TestClearPageError when wait pages writeback
  f2fs: update f2fs document
  f2fs: avoid to wait all the node blocks during fsync
  f2fs: check all ones or zeros bitmap with bitops for better mount performance
  f2fs: change the method of calculating the number summary blocks
  f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
  f2fs: add an option to avoid unnecessary BUG_ONs
  f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
  f2fs: fix a deadlock during init_acl procedure
  f2fs: clean up acl flow for better readability
  f2fs: remove unnecessary segment bitmap updates
  f2fs: add tracepoint for vm_page_mkwrite
  f2fs: add tracepoint for set_page_dirty
  f2fs: remove redundant set_page_dirty from write_compacted_summaries
  f2fs: add reclaiming control by sysfs
  f2fs: introduce f2fs_balance_fs_bg for some background jobs
  ...
2013-11-13 15:24:40 +09:00
Luis Ortega Perez de Villar
965fd2968a Documentation/filesystems/vfat.txt: fix directory entry example
Slots can store up to 13 characters for the file name but one of the
examples has one character too many.

Signed-off-by: Luis Ortega Perez de Villar <luiorpe1@upv.es>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:32 +09:00
Naoya Horiguchi
ec8e41aec1 /proc/pid/smaps: show VM_SOFTDIRTY flag in VmFlags line
This flag shows that the VMA is "newly created" and thus represents
"dirty" in the task's VM.

You can clear it by "echo 4 > /proc/pid/clear_refs."

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:07 +09:00
J. Bruce Fields
6cedba8962 vfs: take i_mutex on renamed file
A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:40 -05:00
Al Viro
5a3cd99285 iget/iget5: don't bother with ->i_lock until we find a match
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:31 -05:00
Jaegeuk Kim
66e960c692 f2fs: update f2fs document
This patch describes the inline_xattr support in f2fs document.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-11-01 11:20:05 +09:00
Jaegeuk Kim
ea91e9b043 f2fs: add reclaiming control by sysfs
This patch adds a control method in sysfs to reclaim prefree segments.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-10-25 16:54:39 +09:00
David Howells
94d30ae90a FS-Cache: Provide the ability to enable/disable cookies
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells <dhowells@redhat.com
2013-09-27 18:40:25 +01:00
Linus Torvalds
3fe03debfc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "atomic_open-related fixes (Miklos' series, with EEXIST-related parts
  replaced with fix in fs/namei.c:atomic_open() instead of messing with
  the instances) + race fix in autofs + leak on failure exit in 9p"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  9p: don't forget to destroy inode cache if fscache registration fails
  atomic_open: take care of EEXIST in no-open case with O_CREAT|O_EXCL in fs/namei.c
  vfs: don't set FILE_CREATED before calling ->atomic_open()
  nfs: set FILE_CREATED
  gfs2: set FILE_CREATED
  cifs: fix filp leak in cifs_atomic_open()
  vfs: improve i_op->atomic_open() documentation
  autofs4: close the races around autofs4_notify_daemon()
2013-09-18 19:22:22 -05:00
Miklos Szeredi
0854d450e2 vfs: improve i_op->atomic_open() documentation
Fix documentation of ->atomic_open() and related functions: finish_open()
and finish_no_open().  Also add details that seem to be unclear and a
source of bugs (some of which are fixed in the following series).

Cc-ing maintainers of all filesystems implementing ->atomic_open().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Steve French <sfrench@samba.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-16 19:17:24 -04:00
Björn Jacke
81b66220a9 cifs: update cifs.txt and remove some outdated infos
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Björn JACKE <bj@sernet.de>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-13 16:29:58 -05:00
Linus Torvalds
26935fb06e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 4 from Al Viro:
 "list_lru pile, mostly"

This came out of Andrew's pile, Al ended up doing the merge work so that
Andrew didn't have to.

Additionally, a few fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
  super: fix for destroy lrus
  list_lru: dynamically adjust node arrays
  shrinker: Kill old ->shrink API.
  shrinker: convert remaining shrinkers to count/scan API
  staging/lustre/libcfs: cleanup linux-mem.h
  staging/lustre/ptlrpc: convert to new shrinker API
  staging/lustre/obdclass: convert lu_object shrinker to count/scan API
  staging/lustre/ldlm: convert to shrinkers to count/scan API
  hugepage: convert huge zero page shrinker to new shrinker API
  i915: bail out earlier when shrinker cannot acquire mutex
  drivers: convert shrinkers to new count/scan API
  fs: convert fs shrinkers to new scan/count API
  xfs: fix dquot isolation hang
  xfs-convert-dquot-cache-lru-to-list_lru-fix
  xfs: convert dquot cache lru to list_lru
  xfs: rework buffer dispose list tracking
  xfs-convert-buftarg-lru-to-generic-code-fix
  xfs: convert buftarg LRU to generic code
  fs: convert inode and dentry shrinking to be node aware
  vmscan: per-node deferred work
  ...
2013-09-12 15:01:38 -07:00
Linus Torvalds
7b7a2f0a31 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "CIFS update including case insensitive file name matching improvements
  for UTF-8 to Unicode, various small cifs fixes, SMB2/SMB3 leasing
  improvements, support for following SMB2 symlinks, SMB3 packet signing
  improvements"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (25 commits)
  CIFS: Respect epoch value from create lease context v2
  CIFS: Add create lease v2 context for SMB3
  CIFS: Move parsing lease buffer to ops struct
  CIFS: Move creating lease buffer to ops struct
  CIFS: Store lease state itself rather than a mapped oplock value
  CIFS: Replace clientCanCache* bools with an integer
  [CIFS] quiet sparse compile warning
  cifs: Start using per session key for smb2/3 for signature generation
  cifs: Add a variable specific to NTLMSSP for key exchange.
  cifs: Process post session setup code in respective dialect functions.
  CIFS: convert to use le32_add_cpu()
  CIFS: Fix missing lease break
  CIFS: Fix a memory leak when a lease break comes
  cifs: add winucase_convert.pl to Documentation/ directory
  cifs: convert case-insensitive dentry ops to use new case conversion routines
  cifs: add new case-insensitive conversion routines that are based on wchar_t's
  [CIFS] Add Scott to list of cifs contributors
  cifs: Move and expand MAX_SERVER_SIZE definition
  cifs: Expand max share name length to 256
  cifs: Move string length definitions to uapi
  ...
2013-09-12 07:41:12 -07:00
Rob Landley
6e19eded36 initmpfs: use initramfs if rootfstype= or root= specified
Command line option rootfstype=ramfs to obtain old initramfs behavior, and
use ramfs instead of tmpfs for stub when root= defined (for cosmetic
reasons).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rob Landley <rob@landley.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jim Cromie <jim.cromie@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:38 -07:00
Minto Joseph
4649602265 Documentation/filesystems/proc.txt: fix mistake in the description of Committed_AS
Fix mistake in the description of Committed_AS in kernel documentation.

Signed-off-by: Minto Joseph <mvaliyav@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:03 -07:00
Christoph Hellwig
4aa32895c3 fs: remove vfs_follow_link
For a long time no filesystem has been using vfs_follow_link, and as seen
by recent filesystem submissions any new use is accidental as well.

Remove vfs_follow_link, document the replacement in
Documentation/filesystems/porting and also rename __vfs_follow_link
to match its only caller better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-10 17:06:45 -04:00
Linus Torvalds
6cccc7d301 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "This includes both the first pile of Ceph patches (which I sent to
  torvalds@vger, sigh) and a few new patches that add support for
  fscache for Ceph.  That includes a few fscache core fixes that David
  Howells asked go through the Ceph tree.  (Thanks go to Milosz Tanski
  for putting this feature together)

  This first batch of patches (included here) had (has) several
  important RBD bug fixes, hole punch support, several different
  cleanups in the page cache interactions, improvements in the truncate
  code (new truncate mutex to avoid shenanigans with i_mutex), and a
  series of fixes in the synchronous striping read/write code.

  On top of that is a random collection of small fixes all across the
  tree (error code checks and error path cleanup, obsolete wq flags,
  etc)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits)
  ceph: use d_invalidate() to invalidate aliases
  ceph: remove ceph_lookup_inode()
  ceph: trivial buildbot warnings fix
  ceph: Do not do invalidate if the filesystem is mounted nofsc
  ceph: page still marked private_2
  ceph: ceph_readpage_to_fscache didn't check if marked
  ceph: clean PgPrivate2 on returning from readpages
  ceph: use fscache as a local presisent cache
  fscache: Netfs function for cleanup post readpages
  FS-Cache: Fix heading in documentation
  CacheFiles: Implement interface to check cache consistency
  FS-Cache: Add interface to check consistency of a cached object
  rbd: fix null dereference in dout
  rbd: fix buffer size for writes to images with snapshots
  libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc
  rbd: fix I/O error propagation for reads
  ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem
  ceph: allow sync_read/write return partial successed size of read/write.
  ceph: fix bugs about handling short-read for sync read mode.
  ceph: remove useless variable revoked_rdcache
  ...
2013-09-09 09:13:22 -07:00
Jeff Layton
d49ffb0e48 cifs: add winucase_convert.pl to Documentation/ directory
Add the script used to generate the case-conversion tables to the
Documentation/ directory, in case we ever need to update or regenerate
these tables in the future.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:38:11 -05:00
Steve French
4adf53a9e0 [CIFS] Add Scott to list of cifs contributors
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:35:22 -05:00
Jeff Layton
30706a5454 cifs: create a new Documentation/ directory and move docfiles into it
Currently, we have a number of documentation files that live under
fs/cifs/. Generally, these don't get picked up by distro packagers,
since they're in a non-standard location. Move them to a new spot
under Documentation/ instead.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-09-08 14:24:10 -05:00
Linus Torvalds
2e515bf096 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "The usual trivial updates all over the tree -- mostly typo fixes and
  documentation updates"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
  doc: Documentation/cputopology.txt fix typo
  treewide: Convert retrun typos to return
  Fix comment typo for init_cma_reserved_pageblock
  Documentation/trace: Correcting and extending tracepoint documentation
  mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
  power: Documentation: Update s2ram link
  doc: fix a typo in Documentation/00-INDEX
  Documentation/printk-formats.txt: No casts needed for u64/s64
  doc: Fix typo "is is" in Documentations
  treewide: Fix printks with 0x%#
  zram: doc fixes
  Documentation/kmemcheck: update kmemcheck documentation
  doc: documentation/hwspinlock.txt fix typo
  PM / Hibernate: add section for resume options
  doc: filesystems : Fix typo in Documentations/filesystems
  scsi/megaraid fixed several typos in comments
  ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
  treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
  page_isolation: Fix a comment typo in test_pages_isolated()
  doc: fix a typo about irq affinity
  ...
2013-09-06 09:36:28 -07:00
Linus Torvalds
ec0ad73080 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3, reiserfs, udf & isofs fixes from Jan Kara:
 "The contains a bunch of ext3 cleanups and minor improvements, major
  reiserfs locking changes which should hopefully fix deadlocks
  introduced by BKL removal, and udf/isofs changes to refuse mounting fs
  rw instead of mounting it ro automatically which makes eject button
  work as expected for all media (see the changelog for why userspace
  should be ok with this change)"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: use a single printk for jbd_debug()
  reiserfs: locking, release lock around quota operations
  reiserfs: locking, handle nested locks properly
  reiserfs: locking, push write lock out of xattr code
  jbd: relocate assert after state lock in journal_commit_transaction()
  udf: Refuse RW mount of the filesystem instead of making it RO
  udf: Standardize return values in mount sequence
  isofs: Refuse RW mount of the filesystem instead of making it RO
  ext3: allow specifying external journal by pathname mount option
  jbd: remove unneeded semicolon
2013-09-06 09:06:02 -07:00
Linus Torvalds
eb97a784f0 f2fs updates for v3.12
This patch-set includes the following major enhancement patches.
 o support inline xattrs
 o add sysfs support to control GCs explicitly
 o add proc entry to show the current segment usage information
 o improve the GC/SSR performance
 
 The other bug fixes are as follows.
 o avoid the overflow on status calculation
 o fix some error handling routines
 o fix inconsistent xattr states after power-off-recovery
 o fix incorrect xattr node offset definition
 o fix deadlock condition in fsync
 o fix the fdatasync routine for power-off-recovery
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSKDoaAAoJEEAUqH6CSFDSovoQAJSWnvRfeu4olkKe7LblVXA5
 NFYsjtdtnWsmSY1kq2j541SLo8Kw2UibozbrN6BaJ9MOKnTz1+x0R9U0vpewmCO4
 FkxlGX/3i3k/4tR0AvD4U56xgqh+IhYi18nBN8kOTwhLqjFtx5JFKAHBnGwjbB4T
 YpEaitNY6dL8l+DUxs11KnPmNazbck6iNGOYXpvfhTS4DNSJTT0L/fLqugDhFJNI
 7e3f6vVORRwC5UdtJk6B6HXxv1pHv4uGeLki0W4jgGp7AdxpawbfeDrDcrECjoc+
 0s/QQTsjoeIKeCfojSEgLGSSl8PZpx2VVCxri+nMPjLzY81QUXbpsAlhB2RW9Uz/
 E9ESAPpzL9ykh35THALic7N0ATXGlepnu0EGU6+fjWGUIyHeV+2yoswz599VliRO
 GunHgwrfNMyXWHw9zw6SPIJvN3caPn3wlDhffei9wOl92YkleBuHA7ojIzfRc2vz
 YQ7jKmZNZ/CM2qiw350XIfSaa+3iszlxwoWK7DLWQZm3um0MpYme9RmadnPvxsRM
 gnUYiovPwR+om3zAnURMvq/LNKi6NjflRgu2OAU/0CpJMEX9vaVe/xOKdjCs19je
 dxinQGuOS5P+J141SkM3jJ1eyZLC4zCyp42xQSZ1+Zg+6BU4PVY3+i4hCQFopHDt
 fyeav8SM/fk9HrKU3npq
 =YMk3
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches:
   - support inline xattrs
   - add sysfs support to control GCs explicitly
   - add proc entry to show the current segment usage information
   - improve the GC/SSR performance

  The other bug fixes are as follows:
   - avoid the overflow on status calculation
   - fix some error handling routines
   - fix inconsistent xattr states after power-off-recovery
   - fix incorrect xattr node offset definition
   - fix deadlock condition in fsync
   - fix the fdatasync routine for power-off-recovery"

* tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: optimize gc for better performance
  f2fs: merge more bios of node block writes
  f2fs: avoid an overflow during utilization calculation
  f2fs: trigger GC when there are prefree segments
  f2fs: use strncasecmp() simplify the string comparison
  f2fs: fix omitting to update inode page
  f2fs: support the inline xattrs
  f2fs: add the truncate_xattr_node function
  f2fs: introduce __find_xattr for readability
  f2fs: reserve the xattr space dynamically
  f2fs: add flags for inline xattrs
  f2fs: fix error return code in init_f2fs_fs()
  f2fs: fix wrong BUG_ON condition
  f2fs: fix memory leak when init f2fs filesystem fail
  f2fs: fix a compound statement label error
  f2fs: avoid writing inode redundantly when creating a file
  f2fs: alloc_page() doesn't return an ERR_PTR
  f2fs: should cover i_xattr_nid with its xattr node page lock
  f2fs: check the free space first in new_node_page
  f2fs: clean up the needless end 'return' of void function
  ...
2013-09-06 09:04:34 -07:00
Milosz Tanski
5a6f282a20 fscache: Netfs function for cleanup post readpages
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback.  It marks all the pages in the list
provided by readahead with PG_private_2.  In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG.  This happens because the page list still contains marked
pages.

This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages.  It will revoke the pages from the
underlying cache backend and unmark them.

The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.

This can be used to address the following oops:

[12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
	(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)

...

[12410647.597334] Call Trace:
[12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30

Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells
696f69b6b0 FS-Cache: Fix heading in documentation
Fix a heading in the documentation to make it consistent with the contents
list.

Signed-off-by: David Howells <dhowells@redhat.com>
2013-09-06 09:17:30 +01:00
David Howells
da9803bc88 FS-Cache: Add interface to check consistency of a cached object
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:

	int fscache_check_consistency(struct fscache_cookie *cookie)

This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.

The backends now have to implement a mandatory operation pointer:

	int (*check_consistency)(struct fscache_object *object)

that corresponds to the above API call.  FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.

Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
2013-09-06 09:17:30 +01:00
Eric Sandeen
ad4eec6135 ext4: allow specifying external journal by pathname mount option
It's always been a hassle that if an external journal's
device number changes, the filesystem won't mount.
And since boot-time enumeration can change, device number
changes aren't unusual.

The current mechanism to update the journal location is by
passing in a mount option w/ a new devnum, but that's a hassle;
it's a manual approach, fixing things after the fact.

Adding a mount option, "-o journal_path=/dev/$DEVICE" would
help, since then we can do i.e.

# mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...

and it'll mount even if the devnum has changed, as shown here:

# losetup /dev/loop0 journalfile
# mke2fs -L mylabel-journal -O journal_dev /dev/loop0 
# mkfs.ext4 -L mylabel -J device=/dev/loop0 /dev/sdb1

Change the journal device number:

# losetup -d /dev/loop0
# losetup /dev/loop1 journalfile 

And today it will fail:

# mount /dev/sdb1 /mnt/test
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

# dmesg | tail -n 1
[17343.240702] EXT4-fs (sdb1): error: couldn't read superblock of external journal

But with this new mount option, we can specify the new path:

# mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
#

(which does update the encoded device number, incidentally):

# umount /dev/sdb1
# dumpe2fs -h /dev/sdb1 | grep "Journal device"
dumpe2fs 1.41.12 (17-May-2010)
Journal device:	          0x0701

But best of all we can just always mount by journal-path, and
it'll always work:

# mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
#

So the journal_path option can be specified in fstab, and as long as
the disk is available somewhere, and findable by label (or by UUID),
we can mount.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-08-28 19:05:07 -04:00
Masanari Iida
9ed354b732 doc: filesystems : Fix typo in Documentations/filesystems
Correct spelling typo in Documentations/filesystems.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-08-20 13:45:56 +02:00
Namjae Jeon
d2dc095f42 f2fs: add sysfs entries to select the gc policy
Add sysfs entry gc_idle to control the gc policy. Where
gc_idle = 1 corresponds to selecting a cost benefit approach,
while gc_idle = 2 corresponds to selecting a greedy approach
to garbage collection. The selection is mutually exclusive one
approach will work at any point. If gc_idle = 0, then this
option is disabled.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: change the select_gc_type() flow slightly]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-06 22:00:18 +09:00
Namjae Jeon
b59d0bae6c f2fs: add sysfs support for controlling the gc_thread
Add sysfs entries to control the timing parameters for
f2fs gc thread.

Various Sysfs options introduced are:
gc_min_sleep_time: Min Sleep time for GC in ms
gc_max_sleep_time: Max Sleep time for GC in ms
gc_no_gc_sleep_time: Default Sleep time for GC in ms

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: fix an umount bug and some minor changes]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-06 21:53:34 +09:00
Eric Sandeen
cf7eff4666 ext3: allow specifying external journal by pathname mount option
It's always been a hassle that if an external journal's
device number changes, the filesystem won't mount.
And since boot-time enumeration can change, device number
changes aren't unusual.

The current mechanism to update the journal location is by
passing in a mount option w/ a new devnum, but that's a hassle;
it's a manual approach, fixing things after the fact.

Adding a mount option, "-o journal_path=/dev/$DEVICE" would
help, since then we can do i.e.

# mount -o journal_path=/dev/disk/by-label/$JOURNAL_LABEL ...

and it'll mount even if the devnum has changed, as shown here:

# losetup /dev/loop0 journalfile
# mke2fs -L mylabel-journal -O journal_dev /dev/loop0
# mkfs.ext3 -L mylabel -J device=/dev/loop0 /dev/sdb1

Change the journal device number:

# losetup -d /dev/loop0
# losetup /dev/loop1 journalfile

And today it will fail:

# mount /dev/sdb1 /mnt/test
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

# dmesg | tail -n 1
[17343.240702] EXT3-fs (sdb1): error: couldn't read superblock of external journal

But with this new mount option, we can specify the new path:

# mount -o journal_path=/dev/loop1 /dev/sdb1 /mnt/test
#

(which does update the encoded device number, incidentally):

# umount /dev/sdb1
# dumpe2fs -h /dev/sdb1 | grep "Journal device"
dumpe2fs 1.41.12 (17-May-2010)
Journal device:	          0x0701

But best of all we can just always mount by journal-path, and
it'll always work:

# mount -o journal_path=/dev/disk/by-label/mylabel-journal /dev/sdb1 /mnt/test
#

So the journal_path option can be specified in fstab, and as long as
the disk is available somewhere, and findable by label (or by UUID),
we can mount.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-07-31 22:11:15 +02:00
Changman Lee
d51a7fba25 f2fs: add description for fsck.f2fs and dump.f2fs
This patch adds some description on fsck.f2fs and dump.f2fs which is
recently merged into f2fs-tools.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:02 +09:00
Masanari Iida
c9f3f2d8b3 doc: Fix typo in doucmentations
Correct typo (double words) in documentations.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-07-25 12:34:15 +02:00
Linus Torvalds
239dab4636 xfs: update (#2) for 3.11-rc1
- fix for xfs_fsr returning -EINVAL
 - cleanup in xfs_bulkstat
 - cleanup in xfs_open_by_handle
 - update mount options documentation
 - clean up local format handling in xfs_bmapi_write
 - fix dquot log reservations which were too small
 - fix sgid inheritance for subdirectories when default acls are in use
 - add project quota fields to various structures
 - fix teardown of quotainfo structures when quotas are turned off
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJR4FDjAAoJENaLyazVq6ZOhZIQAMLWC4Wz+3PpRLkIlXZ83wdG
 LYDNxdYntSVDXNLCrfIhuavgW1yhseLcQZD34g0hgOLQRsmjvUw2ikO6u7qlyoV1
 GZZVwdVhsZNicycvuEE0Sva9Jmjgxe0XUOCxJBNAWN0fm5Jzg7w0OGWyzxn2obRX
 L4LNPqnQS/phCrtNYfUnXpCuUcy0KbpX4GYrdt5tThIWHm7AcyRnOArEkFvVuwdf
 N3OSN/jSMaEF/GsAqmYFYSXhuL9P1vyBSlyFW82YbyFFd4FKZJbRiFbOsrgbFSkA
 Ssum7N+rfMc9DCbJsrztgxFaYpj42JR5eCm+jvTejx8nJWKiGjMVtzjq4QtwQQ6e
 vby7MzdjZ+l2oJclA0y8hOjeg0R7sPVP7xZziZRuK4PHsjtBH3N2FtCOeBtlGhyW
 14LK+z+5YXU/gEwmxV5LaknODb2mxvWycf70jaQ6bvrQRUiFnPNIxYKvgAx8YJxl
 jgYSassHHKtLg0S54P/T91tRsyDVOhy5lqgeogzK5uYa+v+xlMloG+fLzb9GmIgS
 hXgUIAo+lNlHZkw1FdD4aRgh3OMiUvLQN6woBMbbXfS5XaNpF1UG30YAXeeIqV5e
 cLChzY+jQiCsmcktb3YQs9C5yfxEciFFSYmZOaKCTgQRWWnyI4/lAu1gd2KtTYUt
 ZfV0niME4wp0kBWZHOEH
 =QiZy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs

Pull more xfs updates from Ben Myers:
 "Here are a fix for xfs_fsr, a cleanup in bulkstat, a cleanup in
  xfs_open_by_handle, updated mount options documentation, a cleanup in
  xfs_bmapi_write, a fix for the size of dquot log reservations, a fix
  for sgid inheritance when acls are in use, a fix for cleaning up
  quotainfo structures, and some more of the work which allows group and
  project quotas to be used together.

  We had a few more in this last quota category that we might have liked
  to get in, but it looks there are still a few items that need to be
  addressed.

   - fix for xfs_fsr returning -EINVAL
   - cleanup in xfs_bulkstat
   - cleanup in xfs_open_by_handle
   - update mount options documentation
   - clean up local format handling in xfs_bmapi_write
   - fix dquot log reservations which were too small
   - fix sgid inheritance for subdirectories when default acls are in use
   - add project quota fields to various structures
   - fix teardown of quotainfo structures when quotas are turned off"

* tag 'for-linus-v3.11-rc1-2' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix the logic check for all quotas being turned off
  xfs: Add pquota fields where gquota is used.
  xfs: fix sgid inheritance for subdirectories inheriting default acls [V3]
  xfs: dquot log reservations are too small
  xfs: remove local fork format handling from xfs_bmapi_write()
  xfs: update mount options documentation
  xfs: use get_unused_fd_flags(0) instead of get_unused_fd()
  xfs: clean up unused codes at xfs_bulkstat()
  xfs: use XFS_BMAP_BMDR_SPACE vs. XFS_BROOT_SIZE_ADJ
2013-07-13 11:40:24 -07:00
Dave Chinner
3e5b7d8b49 xfs: update mount options documentation
Because it's horribly out of date.

And mark various deprecated options as deprecated and give them a
removal date.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-07-09 16:26:18 -05:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Mel Gorman
543cc11533 documentation: document the is_dirty_writeback aops callback
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:40 -07:00
Mel Gorman
26c0c5bf38 documentation: update address_space_operations
The documentation for address_space_operations is partially out of date.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:40 -07:00
Pavel Emelyanov
0f8975ec4d mm: soft-dirty bits for user memory changes tracking
The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to.  In order to do this tracking one should

  1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
  2. Wait some time.
  3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is.  Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast.  This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:26 -07:00
Linus Torvalds
790eac5640 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
 "Assorted f_pos race fixes, making do_splice_direct() safe to call with
  i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
  ->d_hash/->d_compare calling conventions changes from Linus, misc
  stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  Document ->tmpfile()
  ext4: ->tmpfile() support
  vfs: export lseek_execute() to modules
  lseek_execute() doesn't need an inode passed to it
  block_dev: switch to fixed_size_llseek()
  cpqphp_sysfs: switch to fixed_size_llseek()
  tile-srom: switch to fixed_size_llseek()
  proc_powerpc: switch to fixed_size_llseek()
  ubi/cdev: switch to fixed_size_llseek()
  pci/proc: switch to fixed_size_llseek()
  isapnp: switch to fixed_size_llseek()
  lpfc: switch to fixed_size_llseek()
  locks: give the blocked_hash its own spinlock
  locks: add a new "lm_owner_key" lock operation
  locks: turn the blocked_list into a hashtable
  locks: convert fl_link to a hlist_node
  locks: avoid taking global lock if possible when waking up blocked waiters
  locks: protect most of the file_lock handling with i_lock
  locks: encapsulate the fl_link list handling
  locks: make "added" in __posix_lock_file a bool
  ...
2013-07-03 09:10:19 -07:00
Al Viro
48bde8d362 Document ->tmpfile()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-03 16:23:29 +04:00
Linus Torvalds
3f490f7f99 This patch-set includes the following major enhancement patches.
o remount_fs callback function
  o restore parent inode number to enhance the fsync performance
  o xattr security labels
  o reduce the number of redundant lock/unlock data pages
  o avoid frequent write_inode calls
 
 The other minor bug fixes are as follows.
  o endian conversion bugs
  o various bugs in the roll-forward recovery routine
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0h8YAAoJEEAUqH6CSFDSJNMP/1V+e/PaLa8YsHuw5eWPT3Fe
 RYFXJv7SXadHqgQInjwD7JOF8BC9DlUyknYUiXFUZgvKsgMHz+OTCNl+1hLTbUXt
 e6Rhn7vU/fMu3TEtj+v6g8JbpXdXH8TSbtvh9LpoczRS98GYpZuckP0BtQsVkTL3
 jIq6WD8JRkb2DvpDl7viTT0Eq0T61CSmtOwOIvfhhiVxggvDWcnR1mTM9Tymdi4J
 NKtFkjCsKP0Z/7IZZUJAczGEHjsT+O6JDwp8+KVWuZT4BSuchoX4MYAY5wX527Ne
 rZvkolbbfnAFCC3BtETr0DPOkpxnHmJ6dEveIxjZ9cux12CAFA/Ww1QAL4ygiDkd
 Avn8BBEJnfnuzeOUkE0by+9hdF9LOU3CVSxiDhWJegYB16z+c9pSBD9xvlKhKk5B
 QNsjptB6m0CftAq7vIDsryL60uJ2cSHFxPqfwAHEpNngiU/NohTFSZE0sUMbLUNh
 FI6NrHoT7yld1HcB6cvL1lnUqIENFbNhDSSDcTdlN49IJJap4oqtgCnmMMIwbUCO
 vR2/26k5W7NwG+K6XN2IX13AsayzQahxTv8in/+LV0bfjHo6/1VzzGcqAmXJbDQw
 dLrNAeWaaIJi8J/zJiENMbFKXTj8Wax9jxKsW+4towQuyEt4ADvyt1c5gX3VR42T
 x8+YEargsdBf6FAhtN+H
 =qFcy
 -----END PGP SIGNATURE-----

Merge tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches:
   - remount_fs callback function
   - restore parent inode number to enhance the fsync performance
   - xattr security labels
   - reduce the number of redundant lock/unlock data pages
   - avoid frequent write_inode calls

  The other minor bug fixes are as follows.
   - endian conversion bugs
   - various bugs in the roll-forward recovery routine"

* tag 'for-f2fs-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (56 commits)
  f2fs: fix to recover i_size from roll-forward
  f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()
  f2fs: remove reusing any prefree segments
  f2fs: code cleanup and simplify in func {find/add}_gc_inode
  f2fs: optimize the init_dirty_segmap function
  f2fs: fix an endian conversion bug detected by sparse
  f2fs: fix crc endian conversion
  f2fs: add remount_fs callback support
  f2fs: recover wrong pino after checkpoint during fsync
  f2fs: optimize do_write_data_page()
  f2fs: make locate_dirty_segment() as static
  f2fs: remove unnecessary parameter "offset" from __add_sum_entry()
  f2fs: avoid freqeunt write_inode calls
  f2fs: optimise the truncate_data_blocks_range() range
  f2fs: use the F2FS specific flags in f2fs_ioctl()
  f2fs: sync dir->i_size with its block allocation
  f2fs: fix i_blocks translation on various types of files
  f2fs: set sb->s_fs_info before calling parse_options()
  f2fs: support xattr security labels
  f2fs: fix iget/iput of dir during recovery
  ...
2013-07-02 09:42:38 -07:00
Linus Torvalds
9e239bb939 Lots of bug fixes, cleanups and optimizations. In the bug fixes
category, of note is a fix for on-line resizing file systems where the
 block size is smaller than the page size (i.e., file systems 1k blocks
 on x86, or more interestingly file systems with 4k blocks on Power or
 ia64 systems.)
 
 In the cleanup category, the ext4's punch hole implementation was
 significantly improved by Lukas Czerner, and now supports bigalloc
 file systems.  In addition, Jan Kara significantly cleaned up the
 write submission code path.  We also improved error checking and added
 a few sanity checks.
 
 In the optimizations category, two major optimizations deserve
 mention.  The first is that ext4_writepages() is now used for
 nodelalloc and ext3 compatibility mode.  This allows writes to be
 submitted much more efficiently as a single bio request, instead of
 being sent as individual 4k writes into the block layer (which then
 relied on the elevator code to coalesce the requests in the block
 queue).  Secondly, the extent cache shrink mechanism, which was
 introduce in 3.9, no longer has a scalability bottleneck caused by the
 i_es_lru spinlock.  Other optimizations include some changes to reduce
 CPU usage and to avoid issuing empty commits unnecessarily.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR0XhgAAoJENNvdpvBGATwMXkQAJwTPk5XYLqtAwLziFLvM6wG
 0tWa1QAzTNo80tLyM9iGqI6x74X5nddLw5NMICUmPooOa9agMuA4tlYVSss5jWzV
 yyB7vLzsc/2eZJusuVqfTKrdGybE+M766OI6VO9WodOoIF1l51JXKjktKeaWegfv
 NkcLKlakD4V+ZASEDB/cOcR/lTwAs9dQ89AZzgPiW+G8Do922QbqkENJB8mhalbg
 rFGX+lu9W0f3fqdmT3Xi8KGn3EglETdVd6jU7kOZN4vb5LcF5BKHQnnUmMlpeWMT
 ksOVasb3RZgcsyf5ZOV5feXV601EsNtPBrHAmH22pWQy3rdTIvMv/il63XlVUXZ2
 AXT3cHEvNQP0/yVaOTCZ9xQVxT8sL4mI6kENP9PtNuntx7E90JBshiP5m24kzTZ/
 zkIeDa+FPhsDx1D5EKErinFLqPV8cPWONbIt/qAgo6663zeeIyMVhzxO4resTS9k
 U2QEztQH+hDDbjgABtz9M/GjSrohkTYNSkKXzhTjqr/m5huBrVMngjy/F4/7G7RD
 vSEx5aXqyagnrUcjsupx+biJ1QvbvZWOVxAE/6hNQNRGDt9gQtHAmKw1eG2mugHX
 +TFDxodNE4iWEURenkUxXW3mDx7hFbGZR0poHG3M/LVhKMAAAw0zoKrrUG5c70G7
 XrddRLGlk4Hf+2o7/D7B
 =SwaI
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "Lots of bug fixes, cleanups and optimizations.  In the bug fixes
  category, of note is a fix for on-line resizing file systems where the
  block size is smaller than the page size (i.e., file systems 1k blocks
  on x86, or more interestingly file systems with 4k blocks on Power or
  ia64 systems.)

  In the cleanup category, the ext4's punch hole implementation was
  significantly improved by Lukas Czerner, and now supports bigalloc
  file systems.  In addition, Jan Kara significantly cleaned up the
  write submission code path.  We also improved error checking and added
  a few sanity checks.

  In the optimizations category, two major optimizations deserve
  mention.  The first is that ext4_writepages() is now used for
  nodelalloc and ext3 compatibility mode.  This allows writes to be
  submitted much more efficiently as a single bio request, instead of
  being sent as individual 4k writes into the block layer (which then
  relied on the elevator code to coalesce the requests in the block
  queue).  Secondly, the extent cache shrink mechanism, which was
  introduce in 3.9, no longer has a scalability bottleneck caused by the
  i_es_lru spinlock.  Other optimizations include some changes to reduce
  CPU usage and to avoid issuing empty commits unnecessarily."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  ext4: optimize starting extent in ext4_ext_rm_leaf()
  jbd2: invalidate handle if jbd2_journal_restart() fails
  ext4: translate flag bits to strings in tracepoints
  ext4: fix up error handling for mpage_map_and_submit_extent()
  jbd2: fix theoretical race in jbd2__journal_restart
  ext4: only zero partial blocks in ext4_zero_partial_blocks()
  ext4: check error return from ext4_write_inline_data_end()
  ext4: delete unnecessary C statements
  ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
  jbd2: move superblock checksum calculation to jbd2_write_superblock()
  ext4: pass inode pointer instead of file pointer to punch hole
  ext4: improve free space calculation for inline_data
  ext4: reduce object size when !CONFIG_PRINTK
  ext4: improve extent cache shrink mechanism to avoid to burn CPU time
  ext4: implement error handling of ext4_mb_new_preallocation()
  ext4: fix corruption when online resizing a fs with 1K block size
  ext4: delete unused variables
  ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
  jbd2: remove debug dependency on debug_fs and update Kconfig help text
  jbd2: use a single printk for jbd_debug()
  ...
2013-07-02 09:39:34 -07:00
Jeff Layton
7b2296afb3 locks: give the blocked_hash its own spinlock
There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:46 +04:00
Jeff Layton
3999e49364 locks: add a new "lm_owner_key" lock operation
Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:45 +04:00
Jeff Layton
1c8c601a8c locks: protect most of the file_lock handling with i_lock
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:42 +04:00
Linus Torvalds
da53be12bb Don't pass inode to ->d_hash() and ->d_compare()
Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:36 +04:00
Al Viro
2233f31aad [readdir] ->readdir() is gone
everything's converted to ->iterate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:04 +04:00
Al Viro
5c0ba4e076 [readdir] introduce iterate_dir() and dir_context
iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:46:46 +04:00
Namjae Jeon
696c018c77 f2fs: add remount_fs callback support
Add the f2fs_remount function call which will be used
during the filesystem remounting. This function
will help us to change the mount options specific to
f2fs.

Also modify the f2fs background_gc mount option, which
will allow the user to dynamically trun on/off the
garbage collection in f2fs based on the background_gc
value. If background_gc=on, Garbage collection will
be turned off & if background_gc=off, Garbage collection
will be truned on.

By default the garbage collection is on in f2fs.

Change Log:
v2: Incorporated the review comments by Gu Zheng.
    Removing the restore part for VFS flags
    Updating comments with proper flag conditions
    Display GC background option as ON/OFF
    Revised conditions to stop GC in case of remount

v1: Initial changes for adding remount_fs callback
support.

Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: change /** with /* for the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-17 21:17:55 +09:00
Dave Chinner
f763fd440e xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit d3eaace84e)
2013-06-06 10:51:34 -05:00
Dave Chinner
d3eaace84e xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
attr2 format is always enabled for v5 superblock filesystems, so the
mount options to enable or disable it need to be cause mount errors.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-06-05 11:21:06 -05:00
Anatol Pomozov
f884ab15af doc: fix misspellings with 'codespell' tool
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:12 +02:00
Lukas Czerner
d47992f86b mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
2013-05-21 23:17:23 -04:00
Linus Torvalds
983a5f84a4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "These are mostly fixes.  The biggest exceptions are Josef's skinny
  extents and Jan Schmidt's code to rebuild our quota indexes if they
  get out of sync (or you enable quotas on an existing filesystem).

  The skinny extents are off by default because they are a new variation
  on the extent allocation tree format.  btrfstune -x enables them, and
  the new format makes the extent allocation tree about 30% smaller.

  I rebased this a few days ago to rework Dave Sterba's crc checks on
  the super block, but almost all of these go back to rc6, since I
  though 3.9 was due any minute.

  The biggest missing fix is the tracepoint bug that was hit late in
  3.9.  I ran into problems with that in overnight testing and I'm still
  tracking it down.  I'll definitely have that fixed for rc2."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
  Btrfs: allow superblock mismatch from older mkfs
  btrfs: enhance superblock checks
  btrfs: fix misleading variable name for flags
  btrfs: use unsigned long type for extent state bits
  Btrfs: improve the loop of scrub_stripe
  btrfs: read entire device info under lock
  btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
  btrfs: handle errors returned from get_tree_block_key
  btrfs: make static code static & remove dead code
  Btrfs: deal with errors in write_dev_supers
  Btrfs: remove almost all of the BUG()'s from tree-log.c
  Btrfs: deal with free space cache errors while replaying log
  Btrfs: automatic rescan after "quota enable" command
  Btrfs: rescan for qgroups
  Btrfs: split btrfs_qgroup_account_ref into four functions
  Btrfs: allocate new chunks if the space is not enough for global rsv
  Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
  btrfs: move leak debug code to functions
  Btrfs: return free space in cow error path
  Btrfs: set UUID in root_item for created trees
  ...
2013-05-09 13:07:40 -07:00
Linus Torvalds
942d33da99 f2fs updates for v3.10
This patch-set includes the following major enhancement patches.
 o introduce a new gloabl lock scheme
 o add tracepoints on several major functions
 o fix the overall cleaning process focused on victim selection
 o apply the block plugging to merge IOs as much as possible
 o enhance management of free nids and its list
 o enhance the readahead mode for node pages
 o address several cretical deadlock conditions
 o reduce lock_page calls
 
 The other minor bug fixes and enhancements are as follows.
 o calculation mistakes: overflow
 o bio types: READ, READA, and READ_SYNC
 o fix the recovery flow, data races, and null pointer errors
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz
 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb
 wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF
 GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s
 XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L
 iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF
 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ
 gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt
 ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK
 d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b
 RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ
 AATl8b+cW/aTZ4l7WOPU
 =Hqii
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce a new gloabl lock scheme
   - add tracepoints on several major functions
   - fix the overall cleaning process focused on victim selection
   - apply the block plugging to merge IOs as much as possible
   - enhance management of free nids and its list
   - enhance the readahead mode for node pages
   - address several cretical deadlock conditions
   - reduce lock_page calls

  The other minor bug fixes and enhancements are as follows.
   - calculation mistakes: overflow
   - bio types: READ, READA, and READ_SYNC
   - fix the recovery flow, data races, and null pointer errors"

* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: cover free_nid management with spin_lock
  f2fs: optimize scan_nat_page()
  f2fs: code cleanup for scan_nat_page() and build_free_nids()
  f2fs: bugfix for alloc_nid_failed()
  f2fs: recover when journal contains deleted files
  f2fs: continue to mount after failing recovery
  f2fs: avoid deadlock during evict after f2fs_gc
  f2fs: modify the number of issued pages to merge IOs
  f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
  f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
  f2fs: check truncation of mapping after lock_page
  f2fs: enhance alloc_nid and build_free_nids flows
  f2fs: add a tracepoint on f2fs_new_inode
  f2fs: check nid == 0 in add_free_nid
  f2fs: add REQ_META about metadata requests for submit
  f2fs: give a chance to merge IOs by IO scheduler
  f2fs: avoid frequent background GC
  f2fs: add tracepoints to debug checkpoint request
  f2fs: add tracepoints for write page operations
  f2fs: add tracepoints to debug the block allocation
  ...
2013-05-08 15:11:48 -07:00
Eric Sandeen
c854a9909a btrfs: document mount options in Documentation/fs/btrfs.txt
Document all current btrfs mount options.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:26 -04:00
Linus Torvalds
1db772216f Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Highlights include:

   - Some more DRC cleanup and performance work from Jeff Layton

   - A gss-proxy upcall from Simo Sorce: currently krb5 mounts to the
     server using credentials from Active Directory often fail due to
     limitations of the svcgssd upcall interface.  This replacement
     lifts those limitations.  The existing upcall is still supported
     for backwards compatibility.

   - More NFSv4.1 support: at this point, if a user with a current
     client who upgrades from 4.0 to 4.1 should see no regressions.  In
     theory we do everything a 4.1 server is required to do.  Patches
     for a couple minor exceptions are ready for 3.11, and with those
     and some more testing I'd like to turn 4.1 on by default in 3.11."

Fix up semantic conflict as per Stephen Rothwell and linux-next:

Commit 030d794bf4 ("SUNRPC: Use gssproxy upcall for server RPCGSS
authentication") adds two new users of "PDE(inode)->data", but we're
supposed to use "PDE_DATA(inode)" instead since commit d9dda78bad
("procfs: new helper - PDE_DATA(inode)").

The old PDE() macro is no longer available since commit c30480b92c
("proc: Make the PROC_I() and PDE() macros internal to procfs")

* 'for-3.10' of git://linux-nfs.org/~bfields/linux: (60 commits)
  NFSD: SECINFO doesn't handle unsupported pseudoflavors correctly
  NFSD: Simplify GSS flavor encoding in nfsd4_do_encode_secinfo()
  nfsd: make symbol nfsd_reply_cache_shrinker static
  svcauth_gss: fix error return code in rsc_parse()
  nfsd4: don't remap EISDIR errors in rename
  svcrpc: fix gss-proxy to respect user namespaces
  SUNRPC: gssp_procedures[] can be static
  SUNRPC: define {create,destroy}_use_gss_proxy_proc_entry in !PROC case
  nfsd4: better error return to indicate SSV non-support
  nfsd: fix EXDEV checking in rename
  SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
  SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
  SUNRPC: conditionally return endtime from import_sec_context
  SUNRPC: allow disabling idle timeout
  SUNRPC: attempt AF_LOCAL connect on setup
  nfsd: Decode and send 64bit time values
  nfsd4: put_client_renew_locked can be static
  nfsd4: remove unused macro
  nfsd4: remove some useless code
  nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED
  ...
2013-05-03 10:59:39 -07:00
Linus Torvalds
c8d8566952 xfs: update for v3.10-rc1
For 3.10-rc1 we have a number of bug fixes and cleanups and a currently
 experimental feature from David Chinner, CRCs protection for metadata.
 CRCs are enabled by using mkfs.xfs to create a filesystem with the
 feature bits set.
 
 * numerous fixes for speculative preallocation
 * don't verify buffers on IO errors
 * rename of random32 to prandom32
 * refactoring/rearrangement in xfs_bmap.c
 * removal of unused m_inode_shrink in struct xfs_mount
 * fix error handling of xfs_bufs and readahead
 * quota driven preallocation throttling
 * fix WARN_ON in xfs_vm_releasepage
 * add ratelimited printk for different alert levels
 * fix spurious forced shutdowns due to freed Extent Free Intents
 * remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
 * remove some obsoleted comments
 * (experimental) CRC support for metadata
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRgocuAAoJENaLyazVq6ZOXHUP+wbTG7P1cX33AeG9PErEJduU
 dpwzDmDn1n41pA5AY3/i5sm67qYSpF793LgI95n+wPxOh0LTDdKPqrLSEh5GAQ2c
 bAaax4UTmwFI8bnnHC2zcexyZX0tKDgfW8pxQe8i8xEh/bJalLFOLq7wTFfhAcQX
 8NqI1BXp6bN7arm37rsUXpOS+mNIXxc5UtpKhREQ1zDQ/J+tQ3dGjnUmmMj4CX7F
 iRKzezT/5YpuWPX0MGgfuUAbSnNDQ9d4tPumTHEuTYuDtVWrea8ZRGMnXs+dEd4l
 NoFpeo1R0XaGtWx/4jOnWnmt3D+O3/k03jrFLmoZQKSuBW27jDkE7RRIq7OPmEo2
 WVhDOO3I3CzoTGWfQ3BZ78dWF6rU/a5baxPmnkla4o4GIxyycgtARvsQWF97aeKO
 ImISIIBrBoifaElKOA+bDyP57EMe5DHSHAiMXGxuo/+djhTAxn5GugLwbes0u/sS
 95DAsGy4PPOKcFJfHJvS0i64+lw0yFmeGqfcQ9GwsXALvl2QmA79O9wB9qN+AaXY
 AwC7eeC3xWnG86aPtxmnK8vduEFXWdBZ2ZPZjtr2wVo+FC/46pRhUqK1cUyDQxXH
 jx5CIyxe+8snRs8eGYu4k6lwVbCH6ICzRhNMtOCB6e4c+bXBun5eAoGP6jln2g3F
 z+CvMq0/WBFJ+86wzJqz
 =hHSs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "For 3.10-rc1 we have a number of bug fixes and cleanups and a
  currently experimental feature from David Chinner, CRCs protection for
  metadata.  CRCs are enabled by using mkfs.xfs to create a filesystem
  with the feature bits set.

   - numerous fixes for speculative preallocation
   - don't verify buffers on IO errors
   - rename of random32 to prandom32
   - refactoring/rearrangement in xfs_bmap.c
   - removal of unused m_inode_shrink in struct xfs_mount
   - fix error handling of xfs_bufs and readahead
   - quota driven preallocation throttling
   - fix WARN_ON in xfs_vm_releasepage
   - add ratelimited printk for different alert levels
   - fix spurious forced shutdowns due to freed Extent Free Intents
   - remove some obsolete XLOG_CIL_HARD_SPACE_LIMIT() macros
   - remove some obsoleted comments
   - (experimental) CRC support for metadata"

* tag 'for-linus-v3.10-rc1' of git://oss.sgi.com/xfs/xfs: (46 commits)
  xfs: fix da node magic number mismatches
  xfs: Remote attr validation fixes and optimisations
  xfs: Teach dquot recovery about CONFIG_XFS_QUOTA
  xfs: add metadata CRC documentation
  xfs: implement extended feature masks
  xfs: add CRC checks to the superblock
  xfs: buffer type overruns blf_flags field
  xfs: add buffer types to directory and attribute buffers
  xfs: add CRC protection to remote attributes
  xfs: split remote attribute code out
  xfs: add CRCs to attr leaf blocks
  xfs: add CRCs to dir2/da node blocks
  xfs: shortform directory offsets change for dir3 format
  xfs: add CRC checking to dir2 leaf blocks
  xfs: add CRC checking to dir2 data blocks
  xfs: add CRC checking to dir2 free blocks
  xfs: add CRC checks to block format directory blocks
  xfs: add CRC checks to remote symlinks
  xfs: split out symlink code into it's own file.
  xfs: add version 3 inode format with CRCs
  ...
2013-05-02 14:49:33 -07:00
Linus Torvalds
149b306089 Mostly performance and bug fixes, plus some cleanups. The one new
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
 allows installation of a hidden inode designed for boot loaders.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRfpDwAAoJENNvdpvBGATwsjwP/17V3AN6XTEhZK80p3/qN5YD
 N2QIHeyYIqCGpczLs2TQEkxWX6nqpDggAPXY956wvgeEMQV+pQ+DLO4Ol9+p5WD2
 hrklleYhtOjFQ3Xh4lqrEi5FzKVzWagVDLqgUjALJ+D+hkDB7ZQT/fm2sH45rzot
 xBp3aVqANU8GqAAbEW4/Ng9ZGMx0dpANiU2svbjM71sv2dCLFmWAkz+GgZsMbuJZ
 vnKIZP6I6plwP3LuZzEbVCA7F2PzC4ywEOJKjIEvgHpX6uMDR3FX8pD5Dlo/o6e2
 eP+KLnD43mJMxBmTn22x5Sm0N6DUzJCEELRJWB9wCZoLdEvbEWRxT3qsPXfLWelG
 2jj4bImXF2CqYEsJww5FV2WdXXdnuM57pZym5vMZGAFyKPSCJobA4Y3XRdXkBfXf
 Gq/cFoPYv2EcBIhz3zrRj+tbY8esbO9wOnF6+x+AF10BspD2V7nuoVdWVhOf0A3v
 i9ifGPwLk3e3xHr9oXheo7IWn52oviZeyD77d7D7MLhgn+xU4LaVhW3R63Q+mI4D
 0TXG25R1CVcE7wyFy3gqSVXSCDO0JcQBL5LgcL+wAGXcHPAXqBpN2DFTPo+9fJH2
 g3YMwr+wMbci1XRVQ2vdTt/nBZYjOCh6PgRmg3KjTz11Ra5EsjQvYjKWYwqf2RGn
 QhCgbzd/qtZfNJztLvr7
 =GCT2
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Mostly performance and bug fixes, plus some cleanups.  The one new
  feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
  allows installation of a hidden inode designed for boot loaders."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
  ext4: fix type-widening bug in inode table readahead code
  ext4: add check for inodes_count overflow in new resize ioctl
  ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
  ext4: fix online resizing for ext3-compat file systems
  jbd2: trace when lock_buffer in do_get_write_access takes a long time
  ext4: mark metadata blocks using bh flags
  buffer: add BH_Prio and BH_Meta flags
  ext4: mark all metadata I/O with REQ_META
  ext4: fix readdir error in case inline_data+^dir_index.
  ext4: fix readdir error in the case of inline_data+dir_index
  jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  ext4: mext_insert_extents should update extent block checksum
  ext4: move quota initialization out of inode allocation transaction
  ext4: reserve xattr index for Rich ACL support
  jbd2: reduce journal_head size
  ext4: clear buffer_uninit flag when submitting IO
  ext4: use io_end for multiple bios
  ext4: make ext4_bio_write_page() use BH_Async_Write flags
  ext4: Use kstrtoul() instead of parse_strtoul()
  ext4: defragmentation code cleanup
  ...
2013-05-01 08:04:12 -07:00
Namjae Jeon
27cf10e133 Documentation: update nfs option in filesystem/vfat.txt
Add descriptions about 'stale_rw' and 'nostale_ro' nfs options in
filesystem/vfat.txt

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:41 -07:00
Dave Chinner
dccc3f447a xfs: add metadata CRC documentation
Add some documentation about the self describing metadata and the
code templates used to implement it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-04-27 13:27:43 -05:00
Simo Sorce
030d794bf4 SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
The main advantge of this new upcall mechanism is that it can handle
big tickets as seen in Kerberos implementations where tickets carry
authorization data like the MS-PAC buffer with AD or the Posix Authorization
Data being discussed in IETF on the krbwg working group.

The Gssproxy program is used to perform the accept_sec_context call on the
kernel's behalf. The code is changed to also pass the input buffer straight
to upcall mechanism to avoid allocating and copying many pages as tokens can
be as big (potentially more in future) as 64KiB.

Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, negotiation api]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 11:41:28 -04:00
Lukas Czerner
27dd438542 ext4: introduce reserved space
Currently in ENOSPC condition when writing into unwritten space, or
punching a hole, we might need to split the extent and grow extent tree.
However since we can not allocate any new metadata blocks we'll have to
zero out unwritten part of extent or punched out part of extent, or in
the worst case return ENOSPC even though use actually does not allocate
any space.

Also in delalloc path we do reserve metadata and data blocks for the
time we're going to write out, however metadata block reservation is
very tricky especially since we expect that logical connectivity implies
physical connectivity, however that might not be the case and hence we
might end up allocating more metadata blocks than previously reserved.
So in future, metadata reservation checks should be removed since we can
not assure that we do not under reserve.

And this is where reserved space comes into the picture. When mounting
the file system we slice off a little bit of the file system space (2%
or 4096 clusters, whichever is smaller) which can be then used for the
cases mentioned above to prevent costly zeroout, or unexpected ENOSPC.

The number of reserved clusters can be set via sysfs, however it can
never be bigger than number of free clusters in the file system.

Note that this patch fixes the failure of xfstest 274 as expected.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-04-09 22:11:22 -04:00
Dr. Tilmann Bubeck
393d1d1d76 ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT
Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
associated attributes (like i_blocks, i_size, i_flags, ...) from the
specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
typically used to store a boot loader in a secure part of the
filesystem, where it can't be changed by a normal user by accident.
The data blocks of the previous boot loader will be associated with
the given inode.

This usercode program is a simple example of the usage:

int main(int argc, char *argv[])
{
  int fd;
  int err;

  if ( argc != 2 ) {
    printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
    exit(1);
  }

  fd = open(argv[1], O_WRONLY);
  if ( fd < 0 ) {
    perror("open");
    exit(1);
  }

  err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
  if ( err < 0 ) {
    perror("ioctl");
    exit(1);
  }

  close(fd);
  exit(0);
}

[ Modified by Theodore Ts'o to fix a number of bugs in the original code.]

Signed-off-by: Dr. Tilmann Bubeck <t.bubeck@reinform.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-04-08 12:54:05 -04:00
Changman Lee
1571f84a1f f2fs: update f2fs.txt related with discard at mkfs
o mkfs.f2fs supports no discard option.
 o fixed volume label size in 512 bytes.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-03 17:27:52 +09:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Jeff Layton
ecf3d1f1aa vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30  # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:09 -05:00
Huajun Li
9268cc3523 f2fs: update f2fs document to reflect SIT/NAT layout correctly
document to reflect the layout generated by mkfs.f2fs .

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-04 09:42:59 +09:00
Linus Torvalds
1f0377ff08 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS update from Al Viro:
 "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted
  misc stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits)
  vfs: make lremovexattr retry once on ESTALE error
  vfs: make removexattr retry once on ESTALE
  vfs: make llistxattr retry once on ESTALE error
  vfs: make listxattr retry once on ESTALE error
  vfs: make lgetxattr retry once on ESTALE
  vfs: make getxattr retry once on an ESTALE error
  vfs: allow lsetxattr() to retry once on ESTALE errors
  vfs: allow setxattr to retry once on ESTALE errors
  vfs: allow utimensat() calls to retry once on an ESTALE error
  vfs: fix user_statfs to retry once on ESTALE errors
  vfs: make fchownat retry once on ESTALE errors
  vfs: make fchmodat retry once on ESTALE errors
  vfs: have chroot retry once on ESTALE error
  vfs: have chdir retry lookup and call once on ESTALE error
  vfs: have faccessat retry once on an ESTALE error
  vfs: have do_sys_truncate retry once on an ESTALE error
  vfs: fix renameat to retry on ESTALE errors
  vfs: make do_unlinkat retry once on ESTALE errors
  vfs: make do_rmdir retry once on ESTALE errors
  vfs: add a flags argument to user_path_parent
  ...
2012-12-20 18:14:31 -08:00
Al Viro
21e89c0c48 Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus 2012-12-20 18:49:14 -05:00
Marco Stornelli
b9f61c3c0c documentation: drop vmtruncate
Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20 18:47:08 -05:00
Linus Torvalds
982197277c Merge branch 'for-3.8' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields:
 "Included this time:

   - more nfsd containerization work from Stanislav Kinsbursky: we're
     not quite there yet, but should be by 3.9.

   - NFSv4.1 progress: implementation of basic backchannel security
     negotiation and the mandatory BACKCHANNEL_CTL operation.  See

       http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

     for remaining TODO's

   - Fixes for some bugs that could be triggered by unusual compounds.
     Our xdr code wasn't designed with v4 compounds in mind, and it
     shows.  A more thorough rewrite is still a todo.

   - If you've ever seen "RPC: multiple fragments per record not
     supported" logged while using some sort of odd userland NFS client,
     that should now be fixed.

   - Further work from Jeff Layton on our mechanism for storing
     information about NFSv4 clients across reboots.

   - Further work from Bryan Schumaker on his fault-injection mechanism
     (which allows us to discard selective NFSv4 state, to excercise
     rarely-taken recovery code paths in the client.)

   - The usual mix of miscellaneous bugs and cleanup.

  Thanks to everyone who tested or contributed this cycle."

* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
  nfsd4: don't leave freed stateid hashed
  nfsd4: free_stateid can use the current stateid
  nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
  nfsd: warn on odd reply state in nfsd_vfs_read
  nfsd4: fix oops on unusual readlike compound
  nfsd4: disable zero-copy on non-final read ops
  svcrpc: fix some printks
  NFSD: Correct the size calculation in fault_inject_write
  NFSD: Pass correct buffer size to rpc_ntop
  nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
  nfsd: simplify service shutdown
  nfsd: replace boolean nfsd_up flag by users counter
  nfsd: simplify NFSv4 state init and shutdown
  nfsd: introduce helpers for generic resources init and shutdown
  nfsd: make NFSd service structure allocated per net
  nfsd: make NFSd service boot time per-net
  nfsd: per-net NFSd up flag introduced
  nfsd: move per-net startup code to separated function
  nfsd: pass net to __write_ports() and down
  nfsd: pass net to nfsd_set_nrthreads()
  ...
2012-12-20 14:04:11 -08:00
David Howells
ef778e7ae6 FS-Cache: Provide proper invalidation
Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one.  The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that.  Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:04:07 +00:00
David Howells
9f10523f89 FS-Cache: Fix operation state management and accounting
Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

 (1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

 (2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

 (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

	(a) The ->n_ops is now the number of extant operations on the object
	    and is now decremented by fscache_put_operation() only.

	(b) The ->n_in_progress is simply the number of objects that have been
	    taken off of the object's pending queue for the purposes of being
	    run.  This is decremented by fscache_op_complete() only.

	(c) The ->n_exclusive is the number of exclusive ops that have been
	    submitted and queued or are in progress.  It is decremented by
	    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

 (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.


FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
 ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
 [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
 [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
 [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
 [<ffffffff810d8d47>] evict+0xa1/0x15c
 [<ffffffff810d8e2e>] dispose_list+0x2c/0x38
 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
 [<ffffffff810c56b7>] prune_super+0xd5/0x140
 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab
 [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
 [<ffffffff8103e009>] ? process_timeout+0xb/0xb
 [<ffffffff8109dba3>] kswapd+0x270/0x289
 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
 [<ffffffff8104bf7a>] kthread+0x7f/0x87
 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
 [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 21:58:26 +00:00
Linus Torvalds
a13eea6bd9 Introduce a new file system, Flash-Friendly File System (F2FS), to Linux 3.8.
Highlights:
 - Add initial f2fs source codes
 - Fix an endian conversion bug
 - Fix build failures on random configs
 - Fix the power-off-recovery routine
 - Minor cleanup, coding style, and typos patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxuJcAAoJEEAUqH6CSFDSq80QAI3i7NgUkx4h225MnbJdEKRb
 YX1MfSPmgE0q/15XS2qQu/s9NGJmXLV1IR9EtRSBlCQjwWhbx9Q9URktGkWslFnx
 6mBLy8EvVKDMVdwoUS8ZY6IjfKbmSnoIHTZrGaT9+9d7k8nlOQLaj3qQF4wBuw1+
 +qhJQV642v8qw7JiVVFgxcBSLpAS9cbdOA0vxfWncMwmRLaEO45W5+rob8ZN8WaS
 BUiYIiue8vlB0VDIYfpl/sSPJC/Bn1XsLKZoS2WJl8CKioE1ptLjT3acUBbabUxp
 hNLl8Ae0PylDMFpH8hrBXhleznrVqEMOTos/Z80/UyBny2sCxJFnaQ60TayUo2l2
 hYk5Wbyj8K7IBJEke23Fepild2PnGz22zf2v+tLxxVgPH5j7/l2XHfy9gPvDbd1P
 4ENiJUC3LE49Mi4TvEIFqhbrcJfD9C+v3bxpWGe8CevrpYZaB8tv/6nQXJCC/Ixp
 tMWqLKlHyXGmk5DZpiSFaj0/GbTPT0UGqZVRzzSXQpKqxJU6eTnXDa6aLUEYH8fH
 grOCriaJrd8SgL3l7RokQSQEyRHuNjMm1tlUQWOObE+y0nJjWb9Amwn1yUtJuNzx
 Np4nnlMhxwJ48P3LeeheSCuOUbxJtOzOR8MVXm7deYiGQbYaqB1/+9TbjOZBSX4O
 fpbCXrmqe1pUBukftZsL
 =iMoX
 -----END PGP SIGNATURE-----

Merge tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull new F2FS filesystem from Jaegeuk Kim:
 "Introduce a new file system, Flash-Friendly File System (F2FS), to
  Linux 3.8.

  Highlights:
   - Add initial f2fs source codes
   - Fix an endian conversion bug
   - Fix build failures on random configs
   - Fix the power-off-recovery routine
   - Minor cleanup, coding style, and typos patches"

From the Kconfig help text:

  F2FS is based on Log-structured File System (LFS), which supports
  versatile "flash-friendly" features. The design has been focused on
  addressing the fundamental issues in LFS, which are snowball effect
  of wandering tree and high cleaning overhead.

  Since flash-based storages show different characteristics according to
  the internal geometry or flash memory management schemes aka FTL, F2FS
  and tools support various parameters not only for configuring on-disk
  layout, but also for selecting allocation and cleaning algorithms.

and there's an article by Neil Brown about it on lwn.net:

  http://lwn.net/Articles/518988/

* tag 'for-3.8-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
  f2fs: fix tracking parent inode number
  f2fs: cleanup the f2fs_bio_alloc routine
  f2fs: introduce accessor to retrieve number of dentry slots
  f2fs: remove redundant call to f2fs_put_page in delete entry
  f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask
  f2fs: rewrite f2fs_bio_alloc to make it simpler
  f2fs: fix a typo in f2fs documentation
  f2fs: remove unused variable
  f2fs: move error condition for mkdir at proper place
  f2fs: remove unneeded initialization
  f2fs: check read only condition before beginning write out
  f2fs: remove unneeded memset from init_once
  f2fs: show error in case of invalid mount arguments
  f2fs: fix the compiler warning for uninitialized use of variable
  f2fs: resolve build failures
  f2fs: adjust kernel coding style
  f2fs: fix endian conversion bugs reported by sparse
  f2fs: remove unneeded version.h header file from f2fs.h
  f2fs: update the f2fs document
  f2fs: update Kconfig and Makefile
  ...
2012-12-20 13:54:52 -08:00
Cyrill Gorcunov
e71ec59320 docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:28 -08:00
Cyrill Gorcunov
f1d8c16298 docs: add documentation about /proc/<pid>/fdinfo/<fd> output
[akpm@linux-foundation.org: tweak documentation]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Helsley <matt.helsley@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:28 -08:00
Kees Cook
2f4b3bf6b2 /proc/pid/status: add "Seccomp" field
It is currently impossible to examine the state of seccomp for a given
process.  While attaching with gdb and attempting "call
prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not
reliable.  If the process is in seccomp mode 1, this query will kill the
process (prctl not allowed), if the process is in mode 2 with prctl not
allowed, it will similarly be killed, and in weird cases, if prctl is
filtered to return errno 0, it can look like seccomp is disabled.

When reviewing the state of running processes, there should be a way to
externally examine the seccomp mode.  ("Did this build of Chrome end up
using seccomp?" "Did my distro ship ssh with seccomp enabled?")

This adds the "Seccomp" line to /proc/$pid/status.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:22 -08:00
Cyrill Gorcunov
834f82e2aa procfs: add VmFlags field in smaps output
During c/r sessions we've found that there is no way at the moment to
fetch some VMA associated flags, such as mlock() and madvise().

This leads us to a problem -- we don't know if we should call for mlock()
and/or madvise() after restore on the vma area we're bringing back to
life.

This patch intorduces a new field into "smaps" output called VmFlags,
where all set flags associated with the particular VMA is shown as two
letter mnemonics.

[ Strictly speaking for c/r we only need mlock/madvise bits but it has been
  said that providing just a few flags looks somehow inconsistent.  So all
  flags are here now. ]

This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as
other applications may start to use these fields.

The data is encoded in a somewhat awkward two letters mnemonic form, to
encourage userspace to be prepared for fields being added or removed in
the future.

[a.p.zijlstra@chello.nl: props to use for_each_set_bit]
[sfr@canb.auug.org.au: props to use array instead of struct]
[akpm@linux-foundation.org: overall redesign and simplification]
[akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:22 -08:00
Jan Kara
58156c8fbf fat: provide option for setting timezone offset
So far FAT either offsets time stamps by sys_tz.minuteswest or leaves them
as they are (when tz=UTC mount option is used).  However in some cases it
is useful if one can specify time stamp offset on his own (e.g.  when time
zone of the camera connected is different from time zone of the computer,
or when HW clock is in UTC and thus sys_tz.minuteswest == 0).

So provide a mount option time_offset= which allows user to specify offset
in minutes that should be applied to time stamps on the filesystem.

akpm: this code would work incorrectly when used via `mount -o remount',
because cached inodes would not be updated.  But fatfs's fat_remount() is
basically a no-op anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:22 -08:00
Linus Torvalds
36cd5c19c3 There are two major features for this merge window. The first is
inline data, which allows small files or directories to be stored in
 the in-inode extended attribute area.  (This requires that the file
 system use inodes which are at least 256 bytes or larger; 128 byte
 inodes do not have any room for in-inode xattrs.)
 
 The second new feature is SEEK_HOLE/SEEK_DATA support.  This is
 enabled by the extent status tree patches, and this infrastructure
 will be used to further optimize ext4 in the future.
 
 Beyond that, we have the usual collection of code cleanups and bug
 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQzTaLAAoJENNvdpvBGATwpqEQAM0WO9Kva3R8SoaD6NYOg4lN
 8oxRlht6yogSd6wwYZm1c4YF9UrhloS9kHyWcH3Wmr9fhM5vig1ec12eDsDGrjBc
 Wb+x+YrmczSJzK380JLxmYnVSXQVFl7/hNqaRowffTOJwgySmp8oLrI88ZcaCmVU
 +qWG2x6eVhCEQrpin9Mv3D6pHkx2hfg9w5sB0K+kpgsdjqLZsmPRmxU9nx0nEJYC
 gmbpo8Dcsfqra6DJosQGo7eFq7J3fm9v1ql+QOxOjc9/zD2XwdQE1JZImehvno5i
 Ekwr9771fsw34/QHJebYRC/OkftmOn4OPuQejd+AKNdBR4mO8G/AsLCroD17uLNi
 NrtMkE6ecJPb3SflarZruNYTUhJfj3H6V9P/8wggpyPzT3l19sqP+2F6GwZspZiV
 EJb2iTKn0Phc2OD1MqO9gFP0g+IMH0kktYdxEf0V2QOQqhQHnPwxF+2Tp6bVQcQs
 KCetN37y60qJ+zKH9xukcXmWQJvnjgmWqZqpomoA4lrwgKazTNDJJ+R+N+r5HKMj
 5cz2ntAhF8FfPhqVf+8DHgjKNUwm6C++O1+Lb9swZ0FkFi5Ob3OlwWaC75Gf4H+P
 2DslBapfM79bX14a9BKaBjly5FsAha7OzR+xo0MZN+fEcMLEk33kcRovcY8DHqxU
 aadriOatYYixvSZ5lL3m
 =aNOf
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "There are two major features for this merge window.  The first is
  inline data, which allows small files or directories to be stored in
  the in-inode extended attribute area.  (This requires that the file
  system use inodes which are at least 256 bytes or larger; 128 byte
  inodes do not have any room for in-inode xattrs.)

  The second new feature is SEEK_HOLE/SEEK_DATA support.  This is
  enabled by the extent status tree patches, and this infrastructure
  will be used to further optimize ext4 in the future.

  Beyond that, we have the usual collection of code cleanups and bug
  fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
  ext4: zero out inline data using memset() instead of empty_zero_page
  ext4: ensure Inode flags consistency are checked at build time
  ext4: Remove CONFIG_EXT4_FS_XATTR
  ext4: remove unused variable from ext4_ext_in_cache()
  ext4: remove redundant initialization in ext4_fill_super()
  ext4: remove redundant code in ext4_alloc_inode()
  ext4: use sync_inode_metadata() when syncing inode metadata
  ext4: enable ext4 inline support
  ext4: let fallocate handle inline data correctly
  ext4: let ext4_truncate handle inline data correctly
  ext4: evict inline data out if we need to strore xattr in inode
  ext4: let fiemap work with inline data
  ext4: let ext4_rename handle inline dir
  ext4: let empty_dir handle inline dir
  ext4: let ext4_delete_entry() handle inline data
  ext4: make ext4_delete_entry generic
  ext4: let ext4_find_entry handle inline data
  ext4: create a new function search_dir
  ext4: let ext4_readdir handle inline data
  ext4: let add_dir_entry handle inline data properly
  ...
2012-12-16 17:33:01 -08:00
Linus Torvalds
d42b3a2906 Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI update from Peter Anvin:
 "EFI tree, from Matt Fleming.  Most of the patches are the new efivarfs
  filesystem by Matt Garrett & co.  The balance are support for EFI
  wallclock in the absence of a hardware-specific driver, and various
  fixes and cleanups."

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  efivarfs: Make efivarfs_fill_super() static
  x86, efi: Check table header length in efi_bgrt_init()
  efivarfs: Use query_variable_info() to limit kmalloc()
  efivarfs: Fix return value of efivarfs_file_write()
  efivarfs: Return a consistent error when efivarfs_get_inode() fails
  efivarfs: Make 'datasize' unsigned long
  efivarfs: Add unique magic number
  efivarfs: Replace magic number with sizeof(attributes)
  efivarfs: Return an error if we fail to read a variable
  efi: Clarify GUID length calculations
  efivarfs: Implement exclusive access for {get,set}_variable
  efivarfs: efivarfs_fill_super() ensure we clean up correctly on error
  efivarfs: efivarfs_fill_super() ensure we free our temporary name
  efivarfs: efivarfs_fill_super() fix inode reference counts
  efivarfs: efivarfs_create() ensure we drop our reference on inode on error
  efivarfs: efivarfs_file_read ensure we free data in error paths
  x86-64/efi: Use EFI to deal with platform wall clock (again)
  x86/kernel: remove tboot 1:1 page table creation code
  x86, efi: 1:1 pagetable mapping for virtual EFI calls
  x86, mm: Include the entire kernel memory map in trampoline_pgd
  ...
2012-12-14 10:08:40 -08:00
Linus Torvalds
3f1c64f410 xfs: update for 3.8-rc1
- remove the xfssyncd mess
 - only update the last_sync_lsn when a transaction completes
 - zero allocation_args on the kernel stack
 - fix AGF/alloc workqueue deadlock
 - silence uninitialised f.file warning
 - Update inode alloc comments
 - Update mount options documentation
 - report projid32bit feature in geometry call
 - speculative preallocation inode tracking
 - fix attr tree double split corruption
 - fix broken error handling in xfs_vm_writepage
 - drop buffer io reference when a bad bio is built
 - add more attribute tree trace points
 - growfs infrastructure changes for 3.8
 - fs/xfs/xfs_fs_subr.c die die die
 - add CRC infrastructure
 - add CRC checks to the log
 - Remove description of nodelaylog mount option from xfs.txt
 - inode allocation should use unmapped buffers
 - byte range granularity for XFS_IOC_ZERO_RANGE
 - fix direct IO nested transaction deadlock
 - fix stray dquot unlock when reclaiming dquots
 - fix sparse reported log CRC endian issue
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx3MFAAoJENaLyazVq6ZO3MgP/Aux0lrRv8H3j7Bj0zGbx6Wc
 BUVVUTEifMDCvzyQJNqhX7SeINb1IhlYtoH6FDICENA/yWhIzAs7z/xhDbJLFKDq
 xDNwt2WMqIN+DVX/X3SFxevIKMjufKjI4nM9qzcJn7sFmsxreaZLI1Xw5jEab2ou
 kkx//YTzYzk56tUzd6xOI1QFXqU7N2wGylx+eyPcNtFrgVOMUDCIhlVE0WT8sUVv
 jfGrLMVG6g6RLVHqpExzmuXJS04kv+R6WK4J4ZkHZ/shspYRs1lFO2OKUP1gw5/W
 8acDhJPIwKJb5mnPoxvYTXTqUv0jYvBQ+UWv6OWSK4EprN0ePVOFZOCQ9YIIirrU
 jiITiK1c0t9z6O1jenYoGvqptcfb+xodgSpa7lvlMquZCqaoX3mimg/01ii+aZu5
 6b4W4dMZ6UMI8WiCS44GcLdsfC7wuUL1Kwdoh95xzPyT7nUPFiikWkMmYVF1btLQ
 5kftZhOImlhU+js/nVcRfAEiLr1eS1g/nF+3+zNLYGSO0ZaIuJ/8Zg297mYS0pQx
 yWaeblX3idgpnPZxUHDvrPNnJsbnchujNer0V2fCGRcX3nmF5rvaeclsz//eRF/b
 TrpXYxgKr/oXbKmB5KaTHpG+wKSJYKHzMV7fzUwPdLM106OEt98Ke1ptAmTZU9gp
 ugXEwjMEsB3qDwD95JVz
 =Cl7A
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs

Pull xfs update from Ben Myers:
 "There is plenty going on, including the cleanup of xfssyncd, metadata
  verifiers, CRC infrastructure for the log, tracking of inodes with
  speculative allocation, a cleanup of xfs_fs_subr.c, fixes for
  XFS_IOC_ZERO_RANGE, and important fix related to log replay (only
  update the last_sync_lsn when a transaction completes), a fix for
  deadlock on AGF buffers, documentation and comment updates, and a few
  more cleanups and fixes.

  Details:
   - remove the xfssyncd mess
   - only update the last_sync_lsn when a transaction completes
   - zero allocation_args on the kernel stack
   - fix AGF/alloc workqueue deadlock
   - silence uninitialised f.file warning
   - Update inode alloc comments
   - Update mount options documentation
   - report projid32bit feature in geometry call
   - speculative preallocation inode tracking
   - fix attr tree double split corruption
   - fix broken error handling in xfs_vm_writepage
   - drop buffer io reference when a bad bio is built
   - add more attribute tree trace points
   - growfs infrastructure changes for 3.8
   - fs/xfs/xfs_fs_subr.c die die die
   - add CRC infrastructure
   - add CRC checks to the log
   - Remove description of nodelaylog mount option from xfs.txt
   - inode allocation should use unmapped buffers
   - byte range granularity for XFS_IOC_ZERO_RANGE
   - fix direct IO nested transaction deadlock
   - fix stray dquot unlock when reclaiming dquots
   - fix sparse reported log CRC endian issue"

Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch
having been applied twice (commits eaef854335 and 1375cb65e8: "xfs:
growfs: don't read garbage for new secondary superblocks") with later
updates to the affected code in the XFS tree.

* tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits)
  xfs: fix sparse reported log CRC endian issue
  xfs: fix stray dquot unlock when reclaiming dquots
  xfs: fix direct IO nested transaction deadlock.
  xfs: byte range granularity for XFS_IOC_ZERO_RANGE
  xfs: inode allocation should use unmapped buffers.
  xfs: Remove the description of nodelaylog mount option from xfs.txt
  xfs: add CRC checks to the log
  xfs: add CRC infrastructure
  xfs: convert buffer verifiers to an ops structure.
  xfs: connect up write verifiers to new buffers
  xfs: add pre-write metadata buffer verifier callbacks
  xfs: add buffer pre-write callback
  xfs: Add verifiers to dir2 data readahead.
  xfs: add xfs_da_node verification
  xfs: factor and verify attr leaf reads
  xfs: factor dir2 leaf read
  xfs: factor out dir2 data block reading
  xfs: factor dir2 free block reading
  xfs: verify dir2 block format buffers
  xfs: factor dir2 block read operations
  ...
2012-12-12 09:19:45 -08:00
Huajun Li
d08ab08d14 f2fs: fix a typo in f2fs documentation
In f2fs_fs.h, one f2fs inode contains 923 data block pointers, while
f2fs documentation says it is 929. Fix this inconsistence.

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
2012-12-11 13:43:44 +09:00
Jaegeuk Kim
5bb446a289 f2fs: update the f2fs document
I moved the f2fs-tools.git into kernel.org.
And I added a new mailing list, linux-f2fs-devel@lists.sourceforge.net.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11 13:43:42 +09:00
Jaegeuk Kim
98e4da8ca3 f2fs: add document
This adds a document describing the mount options, proc entries, usage, and
design of Flash-Friendly File System, namely F2FS.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11 13:43:39 +09:00
Tao Ma
939da10844 ext4: Remove CONFIG_EXT4_FS_XATTR
Ted has sent out a RFC about removing this feature. Eric and Jan
confirmed that both RedHat and SUSE enable this feature in all their
product.  David also said that "As far as I know, it's enabled in all
Android kernels that use ext4."  So it seems OK for us.

And what's more, as inline data depends its implementation on xattr,
and to be frank, I don't run any test again inline data enabled while
xattr disabled.  So I think we should add inline data and remove this
config option in the same release.

[ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
  isn't much in the grand scheme of things.  Since no one seems to be
  testing this configuration except for some automated compile farms, on
  balance we are better removing this config option, and so that it is
  effectively always enabled. -- tytso ]

Cc: David Brown <davidb@codeaurora.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 16:30:43 -05:00
Satoru Takeuchi
0acba3cd01 xfs: Remove the description of nodelaylog mount option from xfs.txt
nodelaylog mount option is removed by commit 93b8a585. But there still be
the description about it in the xfs document. This patch removes it.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-26 16:00:51 -06:00
J. Bruce Fields
ffe1137ba7 nfsd4: delay filling in write iovec array till after xdr decoding
Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK.  No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation.  In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op.  So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them.  So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time.  I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:15 -05:00
David Rientjes
fa0cbbf145 mm, oom: reintroduce /proc/pid/oom_adj
This is mostly a revert of 01dc52ebdf ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.

It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels.  It simply scales the value linearly when /proc/pid/oom_score_adj
is written.

The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt.  We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.

Reported-by: Artem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 10:15:35 -08:00
J. Bruce Fields
292a41716a nfsd4: update documentation on 4.1 progress
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-10 14:52:03 -05:00
J. Bruce Fields
cb73a9f464 nfsd4: implement backchannel_ctl operation
This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:39:58 -05:00
Carlos Maiolino
c99abb8f56 xfs: Update mount options documentation
Once inode64 is the default allocation mode now, kernel documentation should be
updated to match this behaviour.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-02 17:21:13 -05:00
Matt Fleming
e913ca7d16 efivarfs: Add documentation for the EFI variable filesystem
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-10-30 10:39:18 +00:00
Linus Torvalds
bd81ccea85 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from J Bruce Fields:
 "Another relatively quiet cycle.  There was some progress on my
  remaining 4.1 todo's, but a couple of them were just of the form
  "check that we do X correctly", so didn't have much affect on the
  code.

  Other than that, a bunch of cleanup and some bugfixes (including an
  annoying NFSv4.0 state leak and a busy-loop in the server that could
  cause it to peg the CPU without making progress)."

* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
  UAPI: (Scripted) Disintegrate include/linux/sunrpc
  UAPI: (Scripted) Disintegrate include/linux/nfsd
  nfsd4: don't allow reclaims of expired clients
  nfsd4: remove redundant callback probe
  nfsd4: expire old client earlier
  nfsd4: separate session allocation and initialization
  nfsd4: clean up session allocation
  nfsd4: minor free_session cleanup
  nfsd4: new_conn_from_crses should only allocate
  nfsd4: separate connection allocation and initialization
  nfsd4: reject bad forechannel attrs earlier
  nfsd4: enforce per-client sessions/no-sessions distinction
  nfsd4: set cl_minorversion at create time
  nfsd4: don't pin clientids to pseudoflavors
  nfsd4: fix bind_conn_to_session xdr comment
  nfsd4: cast readlink() bug argument
  NFSD: pass null terminated buf to kstrtouint()
  nfsd: remove duplicate init in nfsd4_cb_recall
  nfsd4: eliminate redundant nfs4_free_stateid
  fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
  ...
2012-10-13 10:53:54 +09:00
Linus Torvalds
df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
J. Bruce Fields
f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Davidlohr Bueso
01dc52ebdf oom: remove deprecated oom_adj
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:24 +09:00
Linus Torvalds
6432f21284 The big new feature added this time is supporting online resizing
using the meta_bg feature.  This allows us to resize file systems
 which are greater than 16TB.  In addition, the speed of online
 resizing has been improved in general.
 
 We also fix a number of races, some of which could lead to deadlocks,
 in ext4's Asynchronous I/O and online defrag support, thanks to good
 work by Dmitry Monakhov.
 
 There are also a large number of more minor bug fixes and cleanups
 from a number of other ext4 contributors, quite of few of which have
 submitted fixes for the first time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz
 +FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq
 o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW
 Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q
 bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE
 KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru
 UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu
 izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4
 B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi
 eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC
 YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU
 jfl9q+X1dY4SpybZjLt5
 =iYeV
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The big new feature added this time is supporting online resizing
  using the meta_bg feature.  This allows us to resize file systems
  which are greater than 16TB.  In addition, the speed of online
  resizing has been improved in general.

  We also fix a number of races, some of which could lead to deadlocks,
  in ext4's Asynchronous I/O and online defrag support, thanks to good
  work by Dmitry Monakhov.

  There are also a large number of more minor bug fixes and cleanups
  from a number of other ext4 contributors, quite of few of which have
  submitted fixes for the first time."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
  ext4: fix ext4_flush_completed_IO wait semantics
  ext4: fix mtime update in nodelalloc mode
  ext4: fix ext_remove_space for punch_hole case
  ext4: punch_hole should wait for DIO writers
  ext4: serialize truncate with owerwrite DIO workers
  ext4: endless truncate due to nonlocked dio readers
  ext4: serialize unlocked dio reads with truncate
  ext4: serialize dio nonlocked reads with defrag workers
  ext4: completed_io locking cleanup
  ext4: fix unwritten counter leakage
  ext4: give i_aiodio_unwritten a more appropriate name
  ext4: ext4_inode_info diet
  ext4: convert to use leXX_add_cpu()
  ext4: ext4_bread usage audit
  fs: reserve fallocate flag codepoint
  ext4: remove redundant offset check in mext_check_arguments()
  ext4: don't clear orphan list on ro mount with errors
  jbd2: fix assertion failure in commit code due to lacking transaction credits
  ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
  ext4: enable FITRIM ioctl on bigalloc file system
  ...
2012-10-08 06:36:39 +09:00
Linus Torvalds
eb0ad9c06d JFS TRIM support and some minor fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQbEYbAAoJEDaohF61QIxkd4QP/Akm+fi7I8vHFc7jdckxmt+X
 BS0i1SpMvxH29JhSx2ZaamNdgSW8hc3pVP2XkoICUMIDOBcx564y1mRRnOXZ1mdw
 NAwPmx4gP5OMtHOOJS0X2z7cIRnyFvhtpPtrbU2P5aOmdpEPpLMKilFoOOrWfsn0
 bhNtFQOIZnYetG9isdNxi/P3NWctCDkL2x+PvKjUp081zDkJvSWjn6RB5QGaGKVF
 BOXXSK/1LYvRSJehYXekLObhlCCHUV8jMDsZ8dzJxiHaOdSOKPbjCGG1D8CJIYV2
 7lOaZvjs7muQR4QME2CBQ6VDPW2KUPapAKRuBCglMXwP+Ym+aU/MpXYCE3nydgeQ
 kVR5jvwQcB2hRj8ALyCL79i6jM1DoN3rbaU2FHdU9enHmmEuG4424D5pwHHyUlRJ
 zb52HPGpilAHIyXHAhAl2EOvlFA60Sx1dUKiAkgc+mFKcsZaGUJw3XWSp99trSa2
 f7ZHzrmQaq0tcoYDiH00wZZfCHPlwuXxmFkRrC721xjYNOaMZKAn8n7Xi42Ap30Z
 7IzWhwNOOZuxx9CmEZDXY+6UAx28aRXsuiKwOeUVLyXcAdOx/9DxjoUjUbgNZqf0
 wu6z3kXqkD3ZnkOiKcV1lE1oBtdgZtY2S92s6SBdduxojZqC9U8RYU9ogssePo5R
 pK69xihOXuAetSGmopPO
 =9O7B
 -----END PGP SIGNATURE-----

Merge tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy

Pull JFS update from Dave Kleikamp:
 "JFS TRIM support and some minor fixes"

* tag 'jfs-3.7' of git://github.com/kleikamp/linux-shaggy:
  jfs: Fix do_div precision in commit b40c2e66
  JFS: use list_move instead of list_del/list_add
  jfs: Remove obsolete email address
  fs/jfs: TRIM support for JFS Filesystem
2012-10-03 08:48:21 -07:00
Linus Torvalds
aecdc33e11 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

 1) GRE now works over ipv6, from Dmitry Kozlov.

 2) Make SCTP more network namespace aware, from Eric Biederman.

 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.

 4) Make openvswitch network namespace aware, from Pravin B Shelar.

 5) IPV6 NAT implementation, from Patrick McHardy.

 6) Server side support for TCP Fast Open, from Jerry Chu and others.

 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
    Borkmann.

 8) Increate the loopback default MTU to 64K, from Eric Dumazet.

 9) Use a per-task rather than per-socket page fragment allocator for
    outgoing networking traffic.  This benefits processes that have very
    many mostly idle sockets, which is quite common.

    From Eric Dumazet.

10) Use up to 32K for page fragment allocations, with fallbacks to
    smaller sizes when higher order page allocations fail.  Benefits are
    a) less segments for driver to process b) less calls to page
    allocator c) less waste of space.

    From Eric Dumazet.

11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.

12) VXLAN device driver, one way to handle VLAN issues such as the
    limitation of 4096 VLAN IDs yet still have some level of isolation.
    From Stephen Hemminger.

13) As usual there is a large boatload of driver changes, with the scale
    perhaps tilted towards the wireless side this time around.

Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
  hyperv: Add buffer for extended info after the RNDIS response message.
  hyperv: Report actual status in receive completion packet
  hyperv: Remove extra allocated space for recv_pkt_list elements
  hyperv: Fix page buffer handling in rndis_filter_send_request()
  hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
  hyperv: Fix the max_xfer_size in RNDIS initialization
  vxlan: put UDP socket in correct namespace
  vxlan: Depend on CONFIG_INET
  sfc: Fix the reported priorities of different filter types
  sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
  sfc: Fix loopback self-test with separate_tx_channels=1
  sfc: Fix MCDI structure field lookup
  sfc: Add parentheses around use of bitfield macro arguments
  sfc: Fix null function pointer in efx_sriov_channel_type
  vxlan: virtual extensible lan
  igmp: export symbol ip_mc_leave_group
  netlink: add attributes to fdb interface
  tg3: unconditionally select HWMON support when tg3 is enabled.
  Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
  gre: fix sparse warning
  ...
2012-10-02 13:38:27 -07:00
Chuck Lever
6f2ea7f2a3 NFS: Add nfs4_unique_id boot parameter
An optional boot parameter is introduced to allow client
administrators to specify a string that the Linux NFS client can
insert into its nfs_client_id4 id string, to make it both more
globally unique, and to ensure that it doesn't change even if the
client's nodename changes.

If this boot parameter is not specified, the client's nodename is
used, as before.

Client installation procedures can create a unique string (typically,
a UUID) which remains unchanged during the lifetime of that client
instance.  This works just like creating a UUID for the label of the
system's root and boot volumes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Christoph Fritz
5e953778a2 ipconfig: add nameserver IPs to kernel-parameter ip=
On small systems (e.g. embedded ones) IP addresses are often configured
by bootloaders and get assigned to kernel via parameter "ip=".  If set to
"ip=dhcp", even nameserver entries from DHCP daemons are handled. These
entries exported in /proc/net/pnp are commonly linked by /etc/resolv.conf.

To configure nameservers for networks without DHCP, this patch adds option
<dns0-ip> and <dns1-ip> to kernel-parameter 'ip='.

Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Tested-by: Jan Weitzel <j.weitzel@phytec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-21 14:51:21 -04:00
Dave Kleikamp
16221d9071 jfs: Remove obsolete email address
The MAINTAINERS file suffices.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2012-09-17 12:00:01 -05:00
Tino Reichardt
b40c2e665c fs/jfs: TRIM support for JFS Filesystem
This patch adds support for the two linux interfaces of the discard/TRIM
command for SSD devices and sparse/thinly-provisioned LUNs.

JFS will support batched discard via FITRIM ioctl and online discard
with the discard mount option.

Signed-off-by: Tino Reichardt <list-jfs@mcmilk.de>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2012-09-17 11:58:19 -05:00
Kees Cook
82aceae4f0 debugfs: more tightly restrict default mount mode
Since the debugfs is mostly only used by root, make the default mount
mode 0700. Most system owners do not need a more permissive value,
but they can choose to weaken the restrictions via their fstab.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-27 13:42:02 -07:00
Namjae Jeon
d65226e2bf Documentation: update mount option in filesystem/vfat.txt
Update two mount options(discard, nfs) in vfat.txt.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-21 16:45:02 -07:00
J. Bruce Fields
8a4c6e19cf nfsd: document kernel interfaces for nfsd configuration
These are only needed by nfs-utils.  But I needed to remind myself how
they worked recently and thought this might be helpful.  It's short and
incomplete for now as I was only interested in startup, shutdown, and
configuration of listening sockets.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:03 -04:00
Theodore Ts'o
df981d03ee ext4: add max_dir_size_kb mount option
Very large directories can cause significant performance problems, or
perhaps even invoke the OOM killer, if the process is running in a
highly constrained memory environment (whether it is VM's with a small
amount of memory or in a small memory cgroup).

So it is useful, in cloud server/data center environments, to be able
to set a filesystem-wide cap on the maximum size of a directory, to
ensure that directories never get larger than a sane size.  We do this
via a new mount option, max_dir_size_kb.  If there is an attempt to
grow the directory larger than max_dir_size_kb, the system call will
return ENOSPC instead.

Google-Bug-Id: 6863013

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-08-17 09:48:17 -04:00
Artem Bityutskiy
34e5053fbe Documentation: get rid of write_super
The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from various pieces of the kernel documentation.

Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-04 01:25:20 +04:00
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
J. Bruce Fields
068535f1fe locks: remove unused lm_release_private
In commit 3b6e2723f3 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01 09:01:46 -07:00
Mel Gorman
62c230bc17 mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages
Currently swapfiles are managed entirely by the core VM by using ->bmap to
allocate space and write to the blocks directly.  This effectively ensures
that the underlying blocks are allocated and avoids the need for the swap
subsystem to locate what physical blocks store offsets within a file.

If the swap subsystem is to use the filesystem information to locate the
blocks, it is critical that information such as block groups, block
bitmaps and the block descriptor table that map the swap file were
resident in memory.  This patch adds address_space_operations that the VM
can call when activating or deactivating swap backed by a file.

  int swap_activate(struct file *);
  int swap_deactivate(struct file *);

The ->swap_activate() method is used to communicate to the file that the
VM relies on it, and the address_space should take adequate measures such
as reserving space in the underlying device, reserving memory for mempools
and pinning information such as the block descriptor table in memory.  The
->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
returned success.

After a successful swapfile ->swap_activate, the swapfile is marked
SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
pages and ->readpage to read.

It is perfectly possible that direct_IO be used to read the swap pages but
it is an unnecessary complication.  Similarly, it is possible that
->writepage be used instead of direct_io to write the pages but filesystem
developers have stated that calling writepage from the VM is undesirable
for a variety of reasons and using direct_IO opens up the possibility of
writing back batches of swap pages in the future.

[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
Valerie Aurora
06fd516c1a Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
freeze_fs/unfreeze_fs ops are called with s_umount held for write, not read.

Signed-off-by: Valerie Aurora <val@vaaconsulting.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 09:45:55 +04:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro
0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro
30d9049474 kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:39 +04:00
Al Viro
d95852777b make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:35 +04:00
Al Viro
47237687d7 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:31 +04:00
Miklos Szeredi
d18e9008c3 vfs: add i_op->atomic_open()
Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation.  If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup.  Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open().  If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place.  This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:04 +04:00
Al Viro
049b3c10ee vfs: update documentation on ->i_dentry handling
we used to need to clean it in RCU callback freeing an inode;
in 3.2 that requirement went away.  Unfortunately, it hadn't
been reflected in Documentation/filesystems/porting.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:51 +04:00
Linus Torvalds
1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Josef Bacik
c3b2da3148 fs: introduce inode operation ->update_time
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-06-01 12:07:25 -04:00
Cyrill Gorcunov
5b172087f9 c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat
We would like to have an ability to restore command line arguments and
program environment pointers but first we need to obtain them somehow.
Thus we put these values into /proc/$pid/stat.  The exit_code is needed to
restore zombie tasks.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Cyrill Gorcunov
818411616b fs, proc: introduce /proc/<pid>/task/<tid>/children entry
When we do checkpoint of a task we need to know the list of children the
task, has but there is no easy and fast way to generate reverse
parent->children chain from arbitrary <pid> (while a parent pid is
provided in "PPid" field of /proc/<pid>/status).

So instead of walking over all pids in the system (creating one big
process tree in memory, just to figure out which children a task has) --
we add explicit /proc/<pid>/task/<tid>/children entry, because the kernel
already has this kind of information but it is not yet exported.

This is a first level children, not the whole process tree.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 17:49:32 -07:00
Mel Gorman
692569946f mm: document the meminfo and vmstat fields of relevance to transparent hugepages
Update Documentation/vm/transhuge.txt and
Documentation/filesystems/proc.txt with some information on monitoring
transparent huge page usage and the associated overhead.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:23 -07:00
Hugh Dickins
17cf28afea mm/fs: remove truncate_range
Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

Based-on-patch-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:23 -07:00
Linus Torvalds
90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Linus Torvalds
ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Linus Torvalds
e8650a0823 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "As usual, it's mostly typo fixes, redundant code elimination and some
  documentation updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
  edac, mips: don't change code that has been removed in edac/mips tree
  xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
  lib: Change mail address of Oskar Schirmer
  net: Change mail address of Oskar Schirmer
  arm/m68k: Change mail address of Sebastian Hess
  i2c: Change mail address of Oskar Schirmer
  net: Fix tcp_build_and_update_options comment in struct tcp_sock
  atomic64_32.h: fix parameter naming mismatch
  Kconfig: replace "--- help ---" with "---help---"
  c2port: fix bogus Kconfig "default no"
  edac: Fix spelling errors.
  qla1280: Remove redundant NULL check before release_firmware() call
  remoteproc: remove redundant NULL check before release_firmware()
  qla2xxx: Remove redundant NULL check before release_firmware() call.
  aic94xx: Get rid of redundant NULL check before release_firmware() call
  tehuti: delete redundant NULL check before release_firmware()
  qlogic: get rid of a redundant test for NULL before call to release_firmware()
  bna: remove redundant NULL test before release_firmware()
  tg3: remove redundant NULL test before release_firmware() call
  typhoon: get rid of redundant conditional before all to release_firmware()
  ...
2012-05-22 19:22:50 -07:00
Linus Torvalds
62c8d92278 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 changes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
  GFS2: Fix quota adjustment return code
  GFS2: Add rgrp information to block_alloc trace point
  GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
  GFS2: Update glock doc to add new stats info
  GFS2: Update main gfs2 doc
  GFS2: Remove redundant metadata block type check
  GFS2: Fix sgid propagation when using ACLs
  GFS2: eliminate log elements and simplify
  GFS2: Eliminate vestigial sd_log_le_rg
  GFS2: Eliminate needless parameter from function gfs2_setbit
  GFS2: Log code fixes
  GFS2: Remove unused argument from gfs2_internal_read
  GFS2: Remove bd_list_tr
  GFS2: Remove duplicate log code
  GFS2: Clean up log write code path
  GFS2: Use variable rather than qa to determine if unstuff necessary
  GFS2: Change variable blk to biblk
  GFS2: Fix function parameter comments in rgrp.c
  GFS2: Eliminate offset parameter to gfs2_setbit
  GFS2: Use slab for block reservation memory
  ...
2012-05-21 19:21:20 -07:00
Paul Gortmaker
ee446fd5e6 tokenring: delete all remaining driver support
This represents the mass deletion of the of the tokenring support.

It gets rid of:
  - the net/tr.c which the drivers depended on
  - the drivers/net component
  - the Kbuild infrastructure around it
  - any tokenring related CONFIG_ settings in any defconfigs
  - the tokenring headers in the include/linux dir
  - the firmware associated with the tokenring drivers.
  - any associated token ring documentation.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-15 20:23:16 -04:00
Steven Whitehouse
2ebc3f8b3e GFS2: Update glock doc to add new stats info
We recently added some glock statistics to GFS2, so this is
a docs update to explain what they all mean. It is based
upon the checkin comment of the patch in question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-10 12:41:40 +01:00
Steven Whitehouse
49f30789fc GFS2: Update main gfs2 doc
Various items were a bit out of date, so this is a refresh to the
latest info.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-10 11:45:31 +01:00
Jan Kara
dbd5768f87 vfs: Rename end_writeback() to clear_inode()
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:41 +08:00
Masanari Iida
c94bed8e19 Documentation: Fix typo in multiple files in Documentation
Correct multiple spelling typo in Documentation.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Rob Landley <rob@landley.net>
Reported-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-16 14:37:13 +02:00
Stefan Hajnoczi
ee65244b21 ext3: update documentation with barrier=1 default
Commit 00eacd6 ("ext3: make ext3 mount default to barrier=1") changed
the default barrier mount option for ext3.  The documentation needs to
be updated, so this patch does that.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:45 +02:00
Al Viro
b1349f2536 typo fix in Documentation/filesystems/vfs.txt
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-09 01:39:24 -04:00
Linus Torvalds
a591afc01d Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x32 support for x86-64 from Ingo Molnar:
 "This tree introduces the X32 binary format and execution mode for x86:
  32-bit data space binaries using 64-bit instructions and 64-bit kernel
  syscalls.

  This allows applications whose working set fits into a 32 bits address
  space to make use of 64-bit instructions while using a 32-bit address
  space with shorter pointers, more compressed data structures, etc."

Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

* 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
  x32: Fix alignment fail in struct compat_siginfo
  x32: Fix stupid ia32/x32 inversion in the siginfo format
  x32: Add ptrace for x32
  x32: Switch to a 64-bit clock_t
  x32: Provide separate is_ia32_task() and is_x32_task() predicates
  x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
  x86/x32: Fix the binutils auto-detect
  x32: Warn and disable rather than error if binutils too old
  x32: Only clear TIF_X32 flag once
  x32: Make sure TS_COMPAT is cleared for x32 tasks
  fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
  fs: Fix close_on_exec pointer in alloc_fdtable
  x32: Drop non-__vdso weak symbols from the x32 VDSO
  x32: Fix coding style violations in the x32 VDSO code
  x32: Add x32 VDSO support
  x32: Allow x32 to be configured
  x32: If configured, add x32 system calls to system call tables
  x32: Handle process creation
  x32: Signal-related system calls
  x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h>
  ...
2012-03-29 18:12:23 -07:00
Linus Torvalds
69e1aaddd6 Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes
The changes to export dirty_writeback_interval are from Artem's s_dirt
 cleanup patch series.  The same is true of the change to remove the
 s_dirt helper functions which never got used by anyone in-tree.  I've
 run these changes by Al Viro, and am carrying them so that Artem can
 more easily fix up the rest of the file systems during the next merge
 window.  (Originally we had hopped to remove the use of s_dirt from
 ext4 during this merge window, but his patches had some bugs, so I
 ultimately ended dropping them from the ext4 tree.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABCAAGBQJPb39rAAoJENNvdpvBGATwVz8P/3V1NqSsk20VJOLbmEE45GxL
 GDzQJ6OsFG0UiQk6ISSrSdwxfav/KTCGySsU9UtAoOdPcBwnnsf8S7wc6OggwwuC
 hBFGwwFzk6YSQaZ58sUxWRGeOJuP/FPem6Id6buC4DQ1KIcznP/hEEgEnh/ir4Ec
 vrsfexY93TR8BE2Mi23v2epDVLU0B6bY/w9nDqbTXif3xN/gh/ypoHHouuM6Bs2n
 TyWHOwD15NwfnvRHd8PfDDqQM/D29x3QI0FMrWj9McpwIz4d4cBfhN4LQ/G+yLDY
 izv5DM10GbinwHPrsOTGVAW3KIdSS9rP3jCJGVuOrJZ9ufGXosvHuIYVhI7J3SBK
 JhBu6QEsN1IsvlVYpz9q8mqVKaDXQLsz2eaTw+i4yfmyOk1kOX7nIEOxYFF78G+V
 Of/W1SpIpJQaXvLHRcDj9fDj0fZTciUZA8v7/HOFS+co2dzIl0iZbcfBFp0/56RY
 sWdQoeRlx1ciVDPR+w2TQO5w3VWQw1gT5aqux0NiPj0XFoiUHScxgNGAYbqENMQw
 v9chvyDMlorqj0rF/Vey5SssgEDi7MTdYuYTi4YyMqr7pcvOJaO85pf+wH9g2eKW
 XhW33PhPGuwCJDP5Pg8Y0Z2Hp/Q3DCqhLqhGfTyAs/NG9+hR4wgp3VWb8CUqhA1t
 C/yzNeOYqScAefCzQx2V
 =+9zk
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates for 3.4 from Ted Ts'o:
 "Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes

  The changes to export dirty_writeback_interval are from Artem's s_dirt
  cleanup patch series.  The same is true of the change to remove the
  s_dirt helper functions which never got used by anyone in-tree.  I've
  run these changes by Al Viro, and am carrying them so that Artem can
  more easily fix up the rest of the file systems during the next merge
  window.  (Originally we had hopped to remove the use of s_dirt from
  ext4 during this merge window, but his patches had some bugs, so I
  ultimately ended dropping them from the ext4 tree.)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (66 commits)
  vfs: remove unused superblock helpers
  mm: export dirty_writeback_interval
  ext4: remove useless s_dirt assignment
  ext4: write superblock only once on unmount
  ext4: do not mark superblock as dirty unnecessarily
  ext4: correct ext4_punch_hole return codes
  ext4: remove restrictive checks for EOFBLOCKS_FL
  ext4: always set then trimmed blocks count into len
  ext4: fix trimmed block count accunting
  ext4: fix start and len arguments handling in ext4_trim_fs()
  ext4: update s_free_{inodes,blocks}_count during online resize
  ext4: change some printk() calls to use ext4_msg() instead
  ext4: avoid output message interleaving in ext4_error_<foo>()
  ext4: remove trailing newlines from ext4_msg() and ext4_error() messages
  ext4: add no_printk argument validation, fix fallout
  ext4: remove redundant "EXT4-fs: " from uses of ext4_msg
  ext4: give more helpful error message in ext4_ext_rm_leaf()
  ext4: remove unused code from ext4_ext_map_blocks()
  ext4: rewrite punch hole to use ext4_ext_remove_space()
  jbd2: cleanup journal tail after transaction commit
  ...
2012-03-28 10:02:55 -07:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Linus Torvalds
95211279c5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge first batch of patches from Andrew Morton:
 "A few misc things and all the MM queue"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (92 commits)
  memcg: avoid THP split in task migration
  thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE
  memcg: clean up existing move charge code
  mm/memcontrol.c: remove unnecessary 'break' in mem_cgroup_read()
  mm/memcontrol.c: remove redundant BUG_ON() in mem_cgroup_usage_unregister_event()
  mm/memcontrol.c: s/stealed/stolen/
  memcg: fix performance of mem_cgroup_begin_update_page_stat()
  memcg: remove PCG_FILE_MAPPED
  memcg: use new logic for page stat accounting
  memcg: remove PCG_MOVE_LOCK flag from page_cgroup
  memcg: simplify move_account() check
  memcg: remove EXPORT_SYMBOL(mem_cgroup_update_page_stat)
  memcg: kill dead prev_priority stubs
  memcg: remove PCG_CACHE page_cgroup flag
  memcg: let css_get_next() rely upon rcu_read_lock()
  cgroup: revert ss_id_lock to spinlock
  idr: make idr_get_next() good for rcu_read_lock()
  memcg: remove unnecessary thp check in page stat accounting
  memcg: remove redundant returns
  memcg: enum lru_list lru
  ...
2012-03-22 09:04:48 -07:00
Siddhesh Poyarekar
b76437579d procfs: mark thread stack correctly in proc/<pid>/maps
Stack for a new thread is mapped by userspace code and passed via
sys_clone.  This memory is currently seen as anonymous in
/proc/<pid>/maps, which makes it difficult to ascertain which mappings
are being used for thread stacks.  This patch uses the individual task
stack pointers to determine which vmas are actually thread stacks.

For a multithreaded program like the following:

	#include <pthread.h>

	void *thread_main(void *foo)
	{
		while(1);
	}

	int main()
	{
		pthread_t t;
		pthread_create(&t, NULL, thread_main, NULL);
		pthread_join(t, NULL);
	}

proc/PID/maps looks like the following:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but
that is not always a reliable way to find out which vma is a thread
stack.  Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same
content.

With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
as the task would see it' and hence, only the vma that that task uses as
stack is marked as [stack].  All other 'stack' vmas are marked as
anonymous memory.  /proc/PID/maps acts as a thread group level view,
where all thread stack vmas are marked as [stack:TID] where TID is the
process ID of the task that uses that vma as stack, while the process
stack is marked as [stack].

So /proc/PID/maps will look like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Thus marking all vmas that are used as stacks by the threads in the
thread group along with the process stack.  The task level maps will
however like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804                            /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0                                  [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482                    /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938                    /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348                    /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

where only the vma that is being used as a stack by *that* task is
marked as [stack].

Analogous changes have been made to /proc/PID/smaps,
/proc/PID/numa_maps, /proc/PID/task/TID/smaps and
/proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
numa_maps:

    [siddhesh@localhost ~ ]$ pgrep a.out
    1441
    [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack:1442]
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack"
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0                          [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
    7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2
    7fff6273a000 default stack anon=3 dirty=3 N0=3
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
    7f8a44492000 default stack anon=2 dirty=2 N0=2
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
    7fff6273a000 default stack anon=3 dirty=3 N0=3

[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:58 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Sachin Bhamare
18d98f6c04 pnfs-obj: autologin: Add support for protocol autologin
The pnfs-objects protocol mandates that we autologin into devices not
present in the system, according to information specified in the
get_device_info returned from the server.

The Protocol specifies two login hints.
1. An IP address:port combination
2. A string URI which is constructed as a URL with a protocol prefix
   followed by :// and a string as address. For each  protocol prefix
   the string-address format might be different.

We only support the second option. The first option is just redundant
to the second one.
NOTE: The Kernel part of autologin does not parse the URI string. It
just channels it to a user-mode script. So any new login protocols should
only update the user-mode script which is a part of the nfs-utils package,
but the Kernel need not change.

We implement the autologin by using the call_usermodehelper() API.
(Thanks to Steve Dickson <steved@redhat.com> for pointing it out)
So there is no running daemon needed, and/or special setup.

We Add the osd_login_prog Kernel module parameters which defaults to:
	/sbin/osd_login

Kernel try's to upcall the program specified in osd_login_prog. If the file is
not found or the execution fails Kernel will disable any farther upcalls, by
zeroing out  osd_login_prog, Until Admin re-enables it by setting the
osd_login_prog parameter to a proper program.

Also add text about the osd_login program command line API to:
	Documentation/filesystems/nfs/pnfs.txt
and documentation of the new  osd_login_prog  module parameter to:
	Documentation/kernel-parameters.txt

TODO: Add timeout option in the case osd_login program gets
              stuck

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Linus Torvalds
69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Al Viro
88187398cc debugfs-related mode_t whack-a-mole
all of those should be umode_t...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:53 -04:00
Kai Bankett
5d026c7242 fs: initial qnx6fs addition
Adds support for qnx6fs readonly support to the linux kernel.

* Mount option
  The option mmi_fs can be used to mount Harman Becker/Audi MMI 3G
  HDD qnx6fs filesystems.

* Documentation
  A high level filesystem stucture description can be found in the
  Documentation/filesystems directory. (qnx6.txt)

* Additional features
  - Active (stable) superblock selection
  - Superblock checksum check (enforced)
  - Supports mount of qnx6 filesystems with to host different endianess
  - Automatic endianess detection
  - Longfilename support (with non-enfocing crc check)
  - All blocksizes (512, 1024, 2048 and 4096 supported)

Signed-off-by: Kai Bankett <chaosman@ontika.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:38 -04:00
Al Viro
32991ab305 vfs: d_alloc_root() gone
all callers converted to d_make_root() by now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:37 -04:00
Masanari Iida
40e47125e6 Documentation: Fix multiple typo in Documentation
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-03-07 16:08:24 +01:00
Masanari Iida
1f8ee46b40 Documentation: Fix Broken URL "freshmeat"
Fix broken URL in documents "freshmeat" to "freecode" in
Documentation/filesystems/ramfs-rootfs-initramfs.txt,
Documentation/scsi/scsi-generic.txt
Documentation/usb/mtouchusb.txt

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-21 11:43:45 +01:00
Eric Sandeen
661aa52057 ext4: remove the resize mount option
The resize mount option seems to be of limited value,
especially in the age of online resize2fs.  Nuke it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-20 17:53:04 -05:00
Eric Sandeen
43e625d84f ext4: remove the journal=update mount option
The V2 journal format was introduced around ten years ago,
for ext3. It seems highly unlikely that anyone will need this
migration option for ext4.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-20 17:53:04 -05:00
David Howells
1dce27c5aa Wrap accesses to the fd_sets in struct fdtable
Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.

The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.

This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:

	void __set_close_on_exec(int fd, struct fdtable *fdt);
	void __clear_close_on_exec(int fd, struct fdtable *fdt);
	bool close_on_exec(int fd, const struct fdtable *fdt);
	void __set_open_fd(int fd, struct fdtable *fdt);
	void __clear_open_fd(int fd, struct fdtable *fdt);
	bool fd_is_open(int fd, const struct fdtable *fdt);

Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.

Note also that I haven't added wrappers for looking behind the scenes at the
the array.  Possibly that should exist too.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2012-02-19 10:30:52 -08:00
Bryan Schumaker
a602bea3e7 NFS: Update idmapper documentation
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:01 -05:00
Ludwig Nussel
d6e486868c debugfs: add mode, uid and gid options
Cautious admins may want to restrict access to debugfs. Currently a
manual chown/chmod e.g. in an init script is needed to achieve that.
Distributions that want to make the mount options configurable need
to add extra config files. By allowing to set the root inode's uid,
gid and mode via mount options no such hacks are needed anymore.
Instead configuration becomes straight forward via fstab.

Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-26 11:28:49 -08:00
Linus Torvalds
0b48d42235 Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux
* 'for-3.3' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: nfsd4_create_clid_dir return value is unused
  NFSD: Change name of extended attribute containing junction
  svcrpc: don't revert to SVC_POOL_DEFAULT on nfsd shutdown
  svcrpc: fix double-free on shutdown of nfsd after changing pool mode
  nfsd4: be forgiving in the absence of the recovery directory
  nfsd4: fix spurious 4.1 post-reboot failures
  NFSD: forget_delegations should use list_for_each_entry_safe
  NFSD: Only reinitilize the recall_lru list under the recall lock
  nfsd4: initialize special stateid's at compile time
  NFSd: use network-namespace-aware cache registering routines
  SUNRPC: create svc_xprt in proper network namespace
  svcrpc: update outdated BKL comment
  nfsd41: allow non-reclaim open-by-fh's in 4.1
  svcrpc: avoid memory-corruption on pool shutdown
  svcrpc: destroy server sockets all at once
  svcrpc: make svc_delete_xprt static
  nfsd: Fix oops when parsing a 0 length export
  nfsd4: Use kmemdup rather than duplicating its implementation
  nfsd4: add a separate (lockowner, inode) lookup
  nfsd4: fix CONFIG_NFSD_FAULT_INJECTION compile error
  ...
2012-01-14 12:26:41 -08:00
Linus Torvalds
96e80a7851 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
  Squashfs: fix i_blocks calculation with extended regular files
  Squashfs: fix mount time sanity check for corrupted superblock
  Squashfs: optimise squashfs_cache_get entry search
  Squashfs: Update documentation to include xattrs
  Squashfs: add missing block release on error condition
2012-01-13 10:34:57 -08:00
Linus Torvalds
1a52bb0b68 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: ensure prealloc_blob is in place when removing xattr
  rbd: initialize snap_rwsem in rbd_add()
  ceph: enable/disable dentry complete flags via mount option
  vfs: export symbol d_find_any_alias()
  ceph: always initialize the dentry in open_root_dentry()
  libceph: remove useless return value for osd_client __send_request()
  ceph: avoid iput() while holding spinlock in ceph_dir_fsync
  ceph: avoid useless dget/dput in encode_fh
  ceph: dereference pointer after checking for NULL
  crush: fix force for non-root TAKE
  ceph: remove unnecessary d_fsdata conditional checks
  ceph: Use kmemdup rather than duplicating its implementation

Fix up conflicts in fs/ceph/super.c (d_alloc_root() failure handling vs
always initialize the dentry in open_root_dentry)
2012-01-13 10:29:21 -08:00
Cyrill Gorcunov
b3f7f573a2 c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4
The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are
involved into calculation of program text/data segment sizes (which might
be seen in /proc/<pid>/statm) and into brk() call final address.

For restore we need to know all these values.  While
mm->start_code/end_code already present in /proc/$pid/stat, the rest
members are not, so this patch brings them in.

The restore procedure of these members is addressed in another patch using
prctl().

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:13 -08:00
Sage Weil
a40dc6cc2e ceph: enable/disable dentry complete flags via mount option
Enable/disable use of the dentry dir 'complete' flag via a mount option.
This lets the admin control whether ceph uses the dcache to satisfy
negative lookups or readdir when it has the entire directory contents in
its cache.

This is purely a performance optimization; correctness is guaranteed
whether it is enabled or not.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-01-12 11:00:40 -08:00
Linus Torvalds
40ba587923 Merge branch 'akpm' (aka "Andrew's patch-bomb")
Andrew elucidates:
 - First installmeant of MM.  We have a HUGE number of MM patches this
   time.  It's crazy.
 - MAINTAINERS updates
 - backlight updates
 - leds
 - checkpatch updates
 - misc ELF stuff
 - rtc updates
 - reiserfs
 - procfs
 - some misc other bits

* akpm: (124 commits)
  user namespace: make signal.c respect user namespaces
  workqueue: make alloc_workqueue() take printf fmt and args for name
  procfs: add hidepid= and gid= mount options
  procfs: parse mount options
  procfs: introduce the /proc/<pid>/map_files/ directory
  procfs: make proc_get_link to use dentry instead of inode
  signal: add block_sigmask() for adding sigmask to current->blocked
  sparc: make SA_NOMASK a synonym of SA_NODEFER
  reiserfs: don't lock root inode searching
  reiserfs: don't lock journal_init()
  reiserfs: delay reiserfs lock until journal initialization
  reiserfs: delete comments referring to the BKL
  drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
  drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
  drivers/rtc/: remove redundant spi driver bus initialization
  drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
  drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
  rtc: convert drivers/rtc/* to use module_platform_driver()
  drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
  drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
  ...
2012-01-10 16:42:48 -08:00
Vasiliy Kulikov
0499680a42 procfs: add hidepid= and gid= mount options
Add support for mount options to restrict access to /proc/PID/
directories.  The default backward-compatible "relaxed" behaviour is left
untouched.

The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:

hidepid=0 (default) means the old behavior - anybody may read all
world-readable /proc/PID/* files.

hidepid=1 means users may not access any /proc/<pid>/ directories, but
their own.  Sensitive files like cmdline, sched*, status are now protected
against other users.  As permission checking done in proc_pid_permission()
and files' permissions are left untouched, programs expecting specific
files' modes are not confused.

hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
users.  It doesn't mean that it hides whether a process exists (it can be
learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
and egid.  It compicates intruder's task of gathering info about running
processes, whether some daemon runs with elevated privileges, whether
another user runs some sensitive program, whether other users run any
program at all, etc.

gid=XXX defines a group that will be able to gather all processes' info
(as in hidepid=0 mode).  This group should be used instead of putting
nonroot user in sudoers file or something.  However, untrusted users (like
daemons, etc.) which are not supposed to monitor the tasks in the whole
system should not be added to the group.

hidepid=1 or higher is designed to restrict access to procfs files, which
might reveal some sensitive private information like precise keystrokes
timings:

http://www.openwall.com/lists/oss-security/2011/11/05/3

hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
conky gracefully handle EPERM/ENOENT and behave as if the current user is
the only user running processes.  pstree shows the process subtree which
contains "pstree" process.

Note: the patch doesn't deal with setuid/setgid issues of keeping
preopened descriptors of procfs files (like
https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
information like the scheduling counters of setuid apps doesn't threaten
anybody's privacy - only the user started the setuid program may read the
counters.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10 16:30:54 -08:00
Theodore Ts'o
ff9cb1c4ee Merge branch 'for_linus' into for_linus_merged
Conflicts:
	fs/ext4/ioctl.c
2012-01-10 11:54:07 -05:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Al Viro
34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Yongqiang Yang
19c5246d25 ext4: add new online resize interface
This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.

In new resize impelmentation, all work like allocating group tables
are done by kernel side, so the new resize interface can support
flex_bg feature and prepares ground for suppoting resize with features
like bigalloc and exclude bitmap. Besides these, user-space tools just
passes in the new number of blocks.

We delay initializing the bitmaps and inode tables of added groups if
possible and add multi groups (a flex groups) each time, so new resize
is very fast like mkfs.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-04 17:09:44 -05:00
Al Viro
faef2b6c99 sysfs: propagate umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:03 -05:00
Al Viro
439475140b configfs: convert to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:57 -05:00
Al Viro
f4ae40a6a5 switch debugfs to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:56 -05:00
Al Viro
1a67aafb5f switch ->mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro
4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Phillip Lougher
89cab5b572 Squashfs: Update documentation to include xattrs
One paragraph was not updated when xattrs were added to Squashfs.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2011-12-30 01:20:24 +00:00
Linus Torvalds
b930c26416 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix meta data raid-repair merge problem
  Btrfs: skip allocation attempt from empty cluster
  Btrfs: skip block groups without enough space for a cluster
  Btrfs: start search for new cluster at the beginning
  Btrfs: reset cluster's max_size when creating bitmap
  Btrfs: initialize new bitmaps' list
  Btrfs: fix oops when calling statfs on readonly device
  Btrfs: Don't error on resizing FS to same size
  Btrfs: fix deadlock on metadata reservation when evicting a inode
  Fix URL of btrfs-progs git repository in docs
  btrfs scrub: handle -ENOMEM from init_ipath()
2011-12-01 08:28:53 -08:00
Arnd Hannemann
b52f75a595 Fix URL of btrfs-progs git repository in docs
The location of the btrfs-progs repository has been changed.
	This patch updates the documentation accordingly.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
2011-11-30 18:46:02 +01:00
Alessandro Rubini
1a087c6ad9 debugfs: add tools to printk 32-bit registers
Some debugfs file I deal with are mostly blocks of registers,
i.e. lines of the form "<name> = 0x<value>". Some files are only
registers, some include registers blocks among other material.  This
patch introduces data structures and functions to deal with both
cases.  I expect more users of this over time.

Signed-off-by: Alessandro Rubini <rubini@gnudd.com>
Acked-by: Giancarlo Asnaghi <giancarlo.asnaghi@st.com>
Cc: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-18 10:31:22 -08:00
Bryan Schumaker
114a0a08d4 NFSD: Added fault injection documentation
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-11-07 21:10:47 -05:00
Marcos Paulo de Souza
ab05210bcb Documentation: HFS is orphaned
Removed the reference of Roman Zippel, last maintainer, of orphaned
HFS filesystem.

Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-04 12:01:48 -07:00
Marcos Paulo de Souza
d83fe6b6c5 Documentation: fix inotify source file paths
Fixes the path to find the source files of the inotify subsystem.

Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-04 12:01:47 -07:00
Linus Torvalds
d211858837 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
  vfs: add d_prune dentry operation
  vfs: protect i_nlink
  filesystems: add set_nlink()
  filesystems: add missing nlink wrappers
  logfs: remove unnecessary nlink setting
  ocfs2: remove unnecessary nlink setting
  jfs: remove unnecessary nlink setting
  hypfs: remove unnecessary nlink setting
  vfs: ignore error on forced remount
  readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
  vfs: fix dentry leak in simple_fill_super()
2011-11-02 11:41:01 -07:00
Linus Torvalds
f1f8935a5c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
  jbd2: Unify log messages in jbd2 code
  jbd/jbd2: validate sb->s_first in journal_get_superblock()
  ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
  ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
  ext4: fix a typo in struct ext4_allocation_context
  ext4: Don't normalize an falloc request if it can fit in 1 extent.
  ext4: remove comments about extent mount option in ext4_new_inode()
  ext4: let ext4_discard_partial_buffers handle unaligned range correctly
  ext4: return ENOMEM if find_or_create_pages fails
  ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
  ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
  ext4: optimize locking for end_io extent conversion
  ext4: remove unnecessary call to waitqueue_active()
  ext4: Use correct locking for ext4_end_io_nolock()
  ext4: fix race in xattr block allocation path
  ext4: trace punch_hole correctly in ext4_ext_map_blocks
  ext4: clean up AGGRESSIVE_TEST code
  ext4: move variables to their scope
  ext4: fix quota accounting during migration
  ext4: migrate cleanup
  ...
2011-11-02 10:06:20 -07:00
Linus Torvalds
34116645d9 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Cleanup metadata flags handling
  udf: Skip mirror metadata FE loading when metadata FE is ok
  ext3: Allow quota file use root reservation
  udf: Remove web reference from UDF MAINTAINERS entry
  quota: Drop path reference on error exit from quotactl
  udf: Neaten udf_debug uses
  udf: Neaten logging output, use vsprintf extension %pV
  udf: Convert printks to pr_<level>
  udf: Rename udf_warning to udf_warn
  udf: Rename udf_error to udf_err
  udf: Promote some debugging messages to udf_error
  ext3: Remove the obsolete broken EXT3_IOC32_WAIT_FOR_READONLY.
  udf: Add readpages support for udf.
  ext3/balloc.c: local functions should be static
  ext2: fix the outdated comment in ext2_nfs_get_inode()
  ext3: remove deprecated oldalloc
  fs/ext3/balloc.c: delete useless initialization
  fs/ext2/balloc.c: delete useless initialization
  ext3: fix message in ext3_remount for rw-remount case
  ext3: Remove i_mutex from ext3_sync_file()

Fix up trivial (printf format cleanup) conflicts in fs/udf/udfdecl.h
2011-11-02 10:05:22 -07:00
Sage Weil
f0023bc617 vfs: add d_prune dentry operation
This adds a d_prune dentry operation that is called by the VFS prior to
pruning (i.e. unhashing and killing) a hashed dentry from the dcache.
Wrap dentry_lru_del() and use the new _prune() helper in the cases where we
are about to unhash and kill the dentry.

This will be used by Ceph to maintain a flag indicating whether the
complete contents of a directory are contained in the dcache, allowing it
to satisfy lookups and readdir without addition server communication.

Renumber a few DCACHE_* #defines to group DCACHE_OP_PRUNE with the other
DCACHE_OP_ bits.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Linus Torvalds
e33bae14fd Merge branch 'for-linus' of git://github.com/ericvh/linux
* 'for-linus' of git://github.com/ericvh/linux:
  9p: fix 9p.txt to advertise msize instead of maxdata
  net/9p: Convert net/9p protocol dumps to tracepoints
  fs/9p: change an int to unsigned int
  fs/9p: Cleanup option parsing in 9p
  9p: move dereference after NULL check
  fs/9p: inode file operation is properly initialized init_special_inode
  fs/9p: Update zero-copy implementation in 9p
2011-10-26 14:20:53 +02:00
Linus Torvalds
2d03423b23 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
  Revert "memory hotplug: Correct page reservation checking"
  Update email address for stable patch submission
  dynamic_debug: fix undefined reference to `__netdev_printk'
  dynamic_debug: use a single printk() to emit messages
  dynamic_debug: remove num_enabled accounting
  dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
  uio: Support physical addresses >32 bits on 32-bit systems
  sysfs: add unsigned long cast to prevent compile warning
  drivers: base: print rejected matches with DEBUG_DRIVER
  memory hotplug: Correct page reservation checking
  memory hotplug: Refuse to add unaligned memory regions
  remove the messy code file Documentation/zh_CN/SubmitChecklist
  ARM: mxc: convert device creation to use platform_device_register_full
  new helper to create platform devices with dma mask
  docs/driver-model: Update device class docs
  docs/driver-model: Document device.groups
  kobj_uevent: Ignore if some listeners cannot handle message
  dynamic_debug: make netif_dbg() call __netdev_printk()
  dynamic_debug: make netdev_dbg() call __netdev_printk()
  ...
2011-10-25 12:13:59 +02:00
Nicolae Mogoreanu
14211d026d 9p: fix 9p.txt to advertise msize instead of maxdata
9p.txt advertises that maxdata mount option should be used to specify
msize, in the code though we use msize option and completely ignore
maxdata if passed

Signed-off-by: Nicolae Mogoreanu <mogoreanu@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-10-24 11:13:12 -05:00
Lukas Czerner
4113c4caa4 ext4: remove deprecated oldalloc
For a long time now orlov is the default block allocator in the
ext4. It performs better than the old one and no one seems to claim
otherwise so we can safely drop it and make oldalloc and orlov mount
option deprecated.

This is a part of the effort to reduce number of ext4 options hence the
test matrix.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 14:34:47 -04:00
Theodore Ts'o
af909a57fd ext4: documentation: remove acl and user_xattr mount options
Acl and user_xattr mount options are no longer needed since those
features are enabled by default if configured in (seee commit
ea66333694). We can not easily deprecate
mount options itself (since it is probably too early), but we can
remove it from documentation first.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 14:01:08 -04:00
Paul Bolle
395cf9691d doc: fix broken references
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27 18:08:04 +02:00
Theodore Ts'o
56889787cf ext4: improve handling of conflicting mount options
If the user explicitly specifies conflicting mount options for
delalloc or dioread_nolock and data=journal, fail the mount, instead
of printing a warning and continuing (since many user's won't look at
dmesg and notice the warning).

Also, print a single warning that data=journal implies that delayed
allocation is not on by default (since it's not supported), and
furthermore that O_DIRECT is not supported.  Improve the text in
Documentation/filesystems/ext4.txt so this is clear there as well.

Similarly, if the dioread_nolock mount option is specified when the
file system block size != PAGE_SIZE, fail the mount instead of
printing a warning message and ignoring the mount option.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-09-03 18:22:38 -04:00
Bart Van Assche
86028619b9 docs/sysfs: Specify ABI documentation requirements
Although it is expected nowadays that every new sysfs attribute is
documented under Documentation/ABI, this is not yet mentioned in the
kernel documentation. This patch adds a note in the sysfs
documentation about that requirement.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-22 17:42:34 -07:00
Lukas Czerner
fbc854027c ext3: remove deprecated oldalloc
For a long time now orlov is the default block allocator in the ext3. It
performs better than the old one and no one seems to claim otherwise so
we can safely drop it and make oldalloc and orlov mount option
deprecated.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-08-17 11:42:19 +02:00
Marcos Souza
4c74916fa8 Documentation: befs.txt: no maintainer, orphaned
Remove the name of Sergey Kostyliov as maintainer of befs.
In the MAINTAINERS file, befs is orphaned.

Signed-off-by: Marcos Souza <marcos.mage@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-13 18:34:03 -07:00
Linus Torvalds
e371d46ae4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  merge fchmod() and fchmodat() guts, kill ancient broken kludge
  xfs: fix misspelled S_IS...()
  xfs: get rid of open-coded S_ISREG(), etc.
  vfs: document locking requirements for d_move, __d_move and d_materialise_unique
  omfs: fix (mode & S_IFDIR) abuse
  btrfs: S_ISREG(mode) is not mode & S_IFREG...
  ima: fmode_t misspelled as mode_t...
  pci-label.c: size_t misspelled as mode_t
  jffs2: S_ISLNK(mode & S_IFMT) is pointless
  snd_msnd ->mode is fmode_t, not mode_t
  v9fs_iop_get_acl: get rid of unused variable
  vfs: dont chain pipe/anon/socket on superblock s_inodes list
  Documentation: Exporting: update description of d_splice_alias
  fs: add missing unlock in default_llseek()
2011-07-26 18:30:20 -07:00
Linus Torvalds
2ac232f37f Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  jbd: change the field "b_cow_tid" of struct journal_head from type unsigned to tid_t
  ext3.txt: update the links in the section "useful links" to the latest ones
  ext3: Fix data corruption in inodes with journalled data
  ext2: check xattr name_len before acquiring xattr_sem in ext2_xattr_get
  ext3: Fix compilation with -DDX_DEBUG
  quota: Remove unused declaration
  jbd: Use WRITE_SYNC in journal checkpoint.
  jbd: Fix oops in journal_remove_journal_head()
  ext3: Return -EINVAL when start is beyond the end of fs in ext3_trim_fs()
  ext3/ioctl.c: silence sparse warnings about different address spaces
  ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
  ext3: Improve truncate error handling
  ext3: use proper little-endian bitops
  ext2: include fs.h into ext2_fs.h
  ext3: Fix oops in ext3_try_to_allocate_with_rsv()
  jbd: fix a bug of leaking jh->b_jcount
  jbd: remove dependency on __GFP_NOFAIL
  ext3: Convert ext3 to new truncate calling convention
  jbd: Add fixed tracepoints
  ext3: Add fixed tracepoints

Resolve conflicts in fs/ext3/fsync.c due to fsync locking push-down and
new fixed tracepoints.
2011-07-26 11:34:40 -07:00
Phillip Lougher
5b9f456772 Documentation: Exporting: update description of d_splice_alias
Following commits a904937 and 0c1aa9a update the d_splice_alias
desciption.

Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-26 12:57:09 -04:00
Linus Torvalds
f0deb97ab1 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
  updated Documentation/ja_JP/SubmittingPatches
  debugfs: add documentation for debugfs_create_x64
  uio: uio_pdrv_genirq: Add OF support
  firmware: gsmi: remove sysfs entries when unload the module
  Documentation/zh_CN: Fix messy code file email-clients.txt
  driver core: add more help description for "path to uevent helper"
  driver-core: modify FIRMWARE_IN_KERNEL help message
  driver-core: Kconfig grammar corrections in firmware configuration
  DOCUMENTATION: Replace create_device() with device_create().
  DOCUMENTATION: Update overview.txt in Doc/driver-model.
  pti: pti_tty_install documentation mispelling.
2011-07-25 23:06:24 -07:00
Linus Torvalds
91d44d9999 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: Make ZLIB compression support optional
  Squashfs: Update documentation for XZ and add squashfs-tools devel tree
2011-07-25 22:50:35 -07:00
Linus Torvalds
2dad3206db Merge branch 'for-3.1' of git://linux-nfs.org/~bfields/linux
* 'for-3.1' of git://linux-nfs.org/~bfields/linux:
  nfsd: don't break lease on CLAIM_DELEGATE_CUR
  locks: rename lock-manager ops
  nfsd4: update nfsv4.1 implementation notes
  nfsd: turn on reply cache for NFSv4
  nfsd4: call nfsd4_release_compoundargs from pc_release
  nfsd41: Deny new lock before RECLAIM_COMPLETE done
  fs: locks: remove init_once
  nfsd41: check the size of request
  nfsd41: error out when client sets maxreq_sz or maxresp_sz too small
  nfsd4: fix file leak on open_downgrade
  nfsd4: remember to put RW access on stateid destruction
  NFSD: Added TEST_STATEID operation
  NFSD: added FREE_STATEID operation
  svcrpc: fix list-corrupting race on nfsd shutdown
  rpc: allow autoloading of gss mechanisms
  svcauth_unix.c: quiet sparse noise
  svcsock.c: include sunrpc.h to quiet sparse noise
  nfsd: Remove deprecated nfsctl system call and related code.
  NFSD: allow OP_DESTROY_CLIENTID to be only op in COMPOUND

Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
2011-07-25 22:49:19 -07:00
Linus Torvalds
d3ec4844d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  fs: Merge split strings
  treewide: fix potentially dangerous trailing ';' in #defined values/expressions
  uwb: Fix misspelling of neighbourhood in comment
  net, netfilter: Remove redundant goto in ebt_ulog_packet
  trivial: don't touch files that are removed in the staging tree
  lib/vsprintf: replace link to Draft by final RFC number
  doc: Kconfig: `to be' -> `be'
  doc: Kconfig: Typo: square -> squared
  doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
  drivers/net: static should be at beginning of declaration
  drivers/media: static should be at beginning of declaration
  drivers/i2c: static should be at beginning of declaration
  XTENSA: static should be at beginning of declaration
  SH: static should be at beginning of declaration
  MIPS: static should be at beginning of declaration
  ARM: static should be at beginning of declaration
  rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
  Update my e-mail address
  PCIe ASPM: forcedly -> forcibly
  gma500: push through device driver tree
  ...

Fix up trivial conflicts:
 - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
 - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
 - drivers/net/r8169.c (just context changes)
2011-07-25 13:56:39 -07:00
Christoph Hellwig
4e34e719e4 fs: take the ACL checks to common code
Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss.  This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:30:23 -04:00
Wang Sheng-Hui
2b76aa074e ext3.txt: update the links in the section "useful links" to the latest ones
In Documentation/filesystems/ext3.txt, the section "useful links"
provides two links which link to ibm developerworks articles. While
the second one can be redirected to the latest one, the first one
   http://www.ibm.com/developerworks/library/l-fs7.html
fails to be redirected.

Update the 2 links to the latest ones.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-07-25 16:36:09 +02:00
Linus Torvalds
bbd9d6f7fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
  vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
  isofs: Remove global fs lock
  jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
  fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
  mm/truncate.c: fix build for CONFIG_BLOCK not enabled
  fs:update the NOTE of the file_operations structure
  Remove dead code in dget_parent()
  AFS: Fix silly characters in a comment
  switch d_add_ci() to d_splice_alias() in "found negative" case as well
  simplify gfs2_lookup()
  jfs_lookup(): don't bother with . or ..
  get rid of useless dget_parent() in btrfs rename() and link()
  get rid of useless dget_parent() in fs/btrfs/ioctl.c
  fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
  drivers: fix up various ->llseek() implementations
  fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
  Ext4: handle SEEK_HOLE/SEEK_DATA generically
  Btrfs: implement our own ->llseek
  fs: add SEEK_HOLE and SEEK_DATA flags
  reiserfs: make reiserfs default to barrier=flush
  ...

Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
2011-07-22 19:02:39 -07:00
Linus Torvalds
59a7ac1211 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits)
  MAINTAINERS: change e-mail of Adrian Hunter
  UBIFS: fix master node recovery
  UBIFS: improve power cut emulation testing
  UBIFS: rename recovery testing variables
  UBIFS: remove custom list of superblocks
  UBIFS: stop re-defining UBI operations
  UBIFS: switch to I/O helpers
  UBIFS: switch to ubifs_leb_write
  UBIFS: switch to ubifs_leb_read
  UBIFS: introduce more I/O helpers
  UBIFS: always print stacktrace when switching to R/O mode
  UBIFS: remove unused and unneeded debugging function
  UBIFS: add global debugfs knobs
  UBIFS: introduce debugfs helpers
  UBIFS: re-arrange debugging code a bit
  UBIFS: be more informative in failure mode
  UBIFS: switch self-check knobs to debugfs
  UBIFS: lessen amount of debugging check types
  UBIFS: introduce helper functions for debugging checks and tests
  UBIFS: amend debugging inode size check function prototype
  ...
2011-07-22 13:09:35 -07:00
Phillip Lougher
812753d66f Squashfs: Update documentation for XZ and add squashfs-tools devel tree
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
2011-07-22 02:33:52 +01:00
Josef Bacik
02c24a8218 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:59 -04:00
Josef Bacik
982d816581 fs: add SEEK_HOLE and SEEK_DATA flags
This just gets us ready to support the SEEK_HOLE and SEEK_DATA flags.  Turns out
using fiemap in things like cp cause more problems than it solves, so lets try
and give userspace an interface that doesn't suck.  We need to match solaris
here, and the definitions are

*o* If /whence/ is SEEK_HOLE, the offset of the start of the
next hole greater than or equal to the supplied offset
is returned. The definition of a hole is provided near
the end of the DESCRIPTION.

*o* If /whence/ is SEEK_DATA, the file pointer is set to the
start of the next non-hole file region greater than or
equal to the supplied offset.

So in the generic case the entire file is data and there is a virtual hole at
the end.  That means we will just return i_size for SEEK_HOLE and will return
the same offset for SEEK_DATA.  This is how Solaris does it so we have to do it
the same way.

Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:56 -04:00
Dave Chinner
8ab47664d5 vfs: increase shrinker batch size
Now that the per-sb shrinker is responsible for shrinking 2 or more
caches, increase the batch size to keep econmies of scale for
shrinking each cache.  Increase the shrinker batch size to 1024
objects.

To allow for a large increase in batch size, add a conditional
reschedule to prune_icache_sb() so that we don't hold the LRU spin
lock for too long. This mirrors the behaviour of the
__shrink_dcache_sb(), and allows us to increase the batch size
without needing to worry about problems caused by long lock hold
times.

To ensure that filesystems using the per-sb shrinker callouts don't
cause problems, document that the object freeing method must
reschedule appropriately inside loops.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:41 -04:00
Dave Chinner
0e1fdafd93 superblock: add filesystem shrinker operations
Now we have a per-superblock shrinker implementation, we can add a
filesystem specific callout to it to allow filesystem internal
caches to be shrunk by the superblock shrinker.

Rather than perpetuate the multipurpose shrinker callback API (i.e.
nr_to_scan == 0 meaning "tell me how many objects freeable in the
cache), two operations will be added. The first will return the
number of objects that are freeable, the second is the actual
shrinker call.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:41 -04:00
J. Bruce Fields
8fb47a4fbf locks: rename lock-manager ops
Both the filesystem and the lock manager can associate operations with a
lock.  Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.

It would save some confusion to give the lock-manager ops different
names.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-20 20:23:19 -04:00
Al Viro
76fe3276be ->permission() sanitizing: document API changes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:36 -04:00
Al Viro
10556cb21a ->permission() sanitizing: don't pass flags to ->permission()
not used by the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:24 -04:00
Al Viro
7e40145eb1 ->permission() sanitizing: don't pass flags to ->check_acl()
not used in the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:21 -04:00
J. Bruce Fields
c46556c6be nfsd4: update nfsv4.1 implementation notes
Update documentation to reflect recent progress.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-07-18 18:41:33 -04:00
Akinobu Mita
d0a542637f debugfs: add documentation for debugfs_create_x64
debugfs_create_x64() exists.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-18 13:36:15 -07:00
Ryusuke Konishi
3a36199114 nilfs2: remove resize from unsupported features list
Resize feature was supported by the commit 4e33f9eab0 but it was not
reflected to the list of unsupported features in nilfs2.txt file.
This updates the list to fix discrepancy.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2011-07-13 16:08:59 +09:00
Jiri Kosina
b7e9c223be Merge branch 'master' into for-next
Sync with Linus' tree to be able to apply pending patches that
are based on newer code already present upstream.
2011-07-11 14:15:55 +02:00
David Howells
c902ce1bfb FS-Cache: Add a helper to bulk uncache pages on an inode
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-07 13:21:56 -07:00
Artem Bityutskiy
81e79d38df UBIFS: switch self-check knobs to debugfs
UBIFS has many built-in self-check functions which can be enabled using the
debug_chks module parameter or the corresponding sysfs file
(/sys/module/ubifs/parameters/debug_chks). However, this is not flexible enough
because it is not per-filesystem. This patch moves this to debugfs interfaces.

We already have debugfs support, so this patch just adds more debugfs files.
While looking at debugfs support I've noticed that it is racy WRT file-system
unmount, and added a TODO entry for that. This problem has been there for long
time and it is quite standard debugfs PITA. The plan is to fix this later.

This patch is simple, but it is large because it changes many places where we
check if a particular type of checks is enabled or disabled.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-07-04 10:54:28 +03:00
Artem Bityutskiy
8d7819b4af UBIFS: lessen amount of debugging check types
We have too many different debugging checks - lessen the amount by merging all
index-related checks into one. At the same time, move the "force in-the-gap"
test to the "index checks" class, because it is too heavy for the "general"
class.

This patch merges TNC, Old index, and Index size check and calles this just
"index checks".

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-07-04 10:54:28 +03:00
Lukas Czerner
ad43401771 ext3/ext4 Documentation: remove bh/nobh since it has been deprecated
Bh and nobh mount option has been deprecated in ext4
(206f7ab4f4) and in ext3
(4c4d390122)
so remove those options from documentation.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-06-25 17:29:52 +02:00
Shaohua Li
09223371de rcu: Use softirq to address performance regression
Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread.  A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler.  Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling.  (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime.  And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work.  RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case.  This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Tested-by: "Alex,Shi" <alex.shi@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-06-14 15:25:39 -07:00
Wanlong Gao
25eb650a69 doc: fix wrong arch/i386 references
Change all "arch/i386" to "arch/x86" in Documentaion/,
since the directory has changed.

Also update the files which have changed their filename
in the meantime accordingly.

Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
[jkosina@suse.cz: reword changelog]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-13 13:43:05 +02:00
Linus Torvalds
36947a7682 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits)
  Cache xattr security drop check for write v2
  fs: block_page_mkwrite should wait for writeback to finish
  mm: Wait for writeback when grabbing pages to begin a write
  configfs: remove unnecessary dentry_unhash on rmdir, dir rename
  fat: remove unnecessary dentry_unhash on rmdir, dir rename
  hpfs: remove unnecessary dentry_unhash on rmdir, dir rename
  minix: remove unnecessary dentry_unhash on rmdir, dir rename
  fuse: remove unnecessary dentry_unhash on rmdir, dir rename
  coda: remove unnecessary dentry_unhash on rmdir, dir rename
  afs: remove unnecessary dentry_unhash on rmdir, dir rename
  affs: remove unnecessary dentry_unhash on rmdir, dir rename
  9p: remove unnecessary dentry_unhash on rmdir, dir rename
  ncpfs: fix rename over directory with dangling references
  ncpfs: document dentry_unhash usage
  ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename
  hostfs: remove unnecessary dentry_unhash on rmdir, dir rename
  hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename
  hfs: remove unnecessary dentry_unhash on rmdir, dir rename
  omfs: remove unnecessary dentry_unhash on rmdir, dir rneame
  udf: remove unnecessary dentry_unhash from rmdir, dir rename
  ...
2011-05-28 13:03:41 -07:00
Linus Torvalds
e52e713ec3 Merge branch 'docs-move' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs
* 'docs-move' of git://git.kernel.org/pub/scm/linux/kernel/git/rdunlap/linux-docs:
  Create Documentation/security/, move LSM-, credentials-, and keys-related files from Documentation/   to Documentation/security/, add Documentation/security/00-INDEX, and update all occurrences of Documentation/<moved_file>   to Documentation/security/<moved_file>.
2011-05-27 10:25:02 -07:00
Christoph Hellwig
aa38572954 fs: pass exact type of data dirties to ->dirty_inode
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-27 07:04:40 -04:00
Jiri Slaby
dcb3a08e69 Documentation: configfs examples crash fix
When configfs_register_subsystem() fails, we unregister too many
subsystems in configfs_example_init.  Decrement i by one to not unregister
non-registered subsystem.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26 17:12:34 -07:00
Linus Torvalds
a74b81b0af Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (28 commits)
  Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.
  ocfs2/dlm: Do not migrate resource to a node that is leaving the domain
  ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG
  Ocfs2/move_extents: Set several trivial constraints for threshold.
  Ocfs2/move_extents: Let defrag handle partial extent moving.
  Ocfs2/move_extents: move/defrag extents within a certain range.
  Ocfs2/move_extents: helper to calculate the defraging length in one run.
  Ocfs2/move_extents: move entire/partial extent.
  Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode.
  Ocfs2/move_extents: helper to probe a proper region to move in an alloc group.
  Ocfs2/move_extents: helper to validate and adjust moving goal.
  Ocfs2/move_extents: find the victim alloc group, where the given #blk fits.
  Ocfs2/move_extents: defrag a range of extent.
  Ocfs2/move_extents: move a range of extent.
  Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving.
  Ocfs2/move_extents: Add basic framework and source files for extent moving.
  Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2.
  Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.c
  Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
  Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
  ...
2011-05-26 10:55:15 -07:00
Linus Torvalds
8a0599dd24 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: correctly decrement the extent buffer index in xfs_bmap_del_extent
  xfs: check for valid indices in xfs_iext_get_ext and xfs_iext_idx_to_irec
  xfs: fix up asserts in xfs_iflush_fork
  xfs: do not do pointer arithmetic on extent records
  xfs: do not use unchecked extent indices in xfs_bunmapi
  xfs: do not use unchecked extent indices in xfs_bmapi
  xfs: do not use unchecked extent indices in xfs_bmap_add_extent_*
  xfs: remove if_lastex
  xfs: remove the unused XFS_BMAPI_RSVBLOCKS flag
  xfs: do not discard alloc btree blocks
  xfs: add online discard support
2011-05-26 10:49:11 -07:00
Linus Torvalds
35806b4f7c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
  jbd2: Add MAINTAINERS entry
  jbd2: fix a potential leak of a journal_head on an error path
  ext4: teach ext4_ext_split to calculate extents efficiently
  ext4: Convert ext4 to new truncate calling convention
  ext4: do not normalize block requests from fallocate()
  ext4: enable "punch hole" functionality
  ext4: add "punch hole" flag to ext4_map_blocks()
  ext4: punch out extents
  ext4: add new function ext4_block_zero_page_range()
  ext4: add flag to ext4_has_free_blocks
  ext4: reserve inodes and feature code for 'quota' feature
  ext4: add support for multiple mount protection
  ext4: ensure f_bfree returned by ext4_statfs() is non-negative
  ext4: protect bb_first_free in ext4_trim_all_free() with group lock
  ext4: only load buddy bitmap in ext4_trim_fs() when it is needed
  jbd2: Fix comment to match the code in jbd2__journal_start()
  ext4: fix waiting and sending of a barrier in ext4_sync_file()
  jbd2: Add function jbd2_trans_will_send_data_barrier()
  jbd2: fix sending of data flush on journal commit
  ext4: fix ext4_ext_fiemap_cb() to handle blocks before request range correctly
  ...
2011-05-26 09:53:20 -07:00
Joel Becker
ece928df16 Merge branch 'move_extents' of git://oss.oracle.com/git/tye/linux-2.6 into ocfs2-merge-window
Conflicts:
	fs/ocfs2/ioctl.c
2011-05-25 21:51:55 -07:00
Linus Torvalds
2a651c7f8d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update Documentation pointers
  net/9p: enable 9p to work in non-default network namespace
  net/9p: p9_idpool_get return -1 on error
  fs/9p: Don't clunk dentry fid when we fail to get a writeback inode
  9p: Small cleanup in <net/9p/9p.h>
  9p: remove experimental tag from tested configurations
  9p: typo fixes and minor cleanups
  net/9p: Change linuxdoc names to match functions.
2011-05-25 09:21:56 -07:00
Mike Travis
4b060420a5 bitmap, irq: add smp_affinity_list interface to /proc/irq
Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the
cpu count is large.

Setting smp affinity to cpus 256 to 263 would be:

	echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity

instead of:

	echo 256-263 > smp_affinity_list

Think about what it looks like for cpus around say, 4088 to 4095.

We already have many alternate "list" interfaces:

/sys/devices/system/cpu/cpuX/indexY/shared_cpu_list
/sys/devices/system/cpu/cpuX/topology/thread_siblings_list
/sys/devices/system/cpu/cpuX/topology/core_siblings_list
/sys/devices/system/node/nodeX/cpulist
/sys/devices/pci***/***/local_cpulist

Add a companion interface, smp_affinity_list to use cpu lists instead of
cpu maps.  This conforms to other companion interfaces where both a map
and a list interface exists.

This required adding a bitmap_parselist_user() function in a manner
similar to the bitmap_parse_user() function.

[akpm@linux-foundation.org: make __bitmap_parselist() static]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:45 -07:00
Eric Van Hensbergen
ee294bedb6 9p: update Documentation pointers
Update documentation pointers to include virtfs publication, 9p RFC
as well as updated list of servers and alternative clients.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-05-25 09:33:05 -05:00
Linus Torvalds
eb08d8ff47 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (52 commits)
  UBIFS: switch to dynamic printks
  UBIFS: fix kernel-doc comments
  UBIFS: fix extremely rare mount failure
  UBIFS: simplify LEB recovery function further
  UBIFS: always cleanup the recovered LEB
  UBIFS: clean up LEB recovery function
  UBIFS: fix-up free space on mount if flag is set
  UBIFS: add the fixup function
  UBIFS: add a superblock flag for free space fix-up
  UBIFS: share the next_log_lnum helper
  UBIFS: expect corruption only in last journal head LEBs
  UBIFS: synchronize write-buffer before switching to the next bud
  UBIFS: remove BUG statement
  UBIFS: change bud replay function conventions
  UBIFS: substitute the replay tree with a replay list
  UBIFS: simplify replay
  UBIFS: store free and dirty space in the bud replay entry
  UBIFS: remove unnecessary stack variable
  UBIFS: double check that buds are replied in order
  UBIFS: make 2 functions static
  ...
2011-05-24 11:51:07 -07:00
Christoph Hellwig
e84661aa84 xfs: add online discard support
Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard.  Note that we don't bother
supporting discard for the non-delaylog mode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-05-24 11:17:13 -05:00
Tiger Yang
e2b0c215c2 ocfs2: clean up mount option about atime in ocfs2.txt
As ocfs2 supports relatime and strictatime, we need update the
relative document. Atime_quantum need work with strictatime, so only
show it in procfs when mount with strictatime.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-05-23 23:37:12 -07:00
Artem Bityutskiy
56e46742e8 UBIFS: switch to dynamic printks
Switch to debugging using dynamic printk (pr_debug()). There is no good reason
to carry custom debugging prints if there is so cool and powerful generic
dynamic printk infrastructure, see Documentation/dynamic-debug-howto.txt. With
dynamic printks we can switch on/of individual prints, per-file, per-function
and per format messages. This means that instead of doing old-fashioned

echo 1 > /sys/module/ubifs/parameters/debug_msgs

to enable general messages, we can do:

echo 'format "UBIFS DBG gen" +ptlf' > control

to enable general messages and additionally ask the dynamic printk
infrastructure to print process ID, line number and function name. So there is
no reason to keep UBIFS-specific crud if there is more powerful generic thing.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-05-23 08:22:20 +03:00
Randy Dunlap
d410fa4ef9 Create Documentation/security/,
move LSM-, credentials-, and keys-related files from Documentation/
  to Documentation/security/,
add Documentation/security/00-INDEX, and
update all occurrences of Documentation/<moved_file>
  to Documentation/security/<moved_file>.
2011-05-19 15:59:38 -07:00
Artem Bityutskiy
bc3f07f090 UBIFS: make force in-the-gaps to be a general self-check
UBIFS can force itself to use the 'in-the-gaps' commit method - the last resort
method which is normally invoced very very rarely. Currently this "force
int-the-gaps" debugging feature is a separate test mode. But it is a bit saner
to make it to be the "general" self-test check instead.

This patch is just a clean-up which should make the debugging code look a bit
nicer and easier to use - we have way too many debugging options.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-05-13 19:23:54 +03:00
Paul E. McKenney
a26ac2455f rcu: move TREE_RCU from softirq to kthread
If RCU priority boosting is to be meaningful, callback invocation must
be boosted in addition to preempted RCU readers.  Otherwise, in presence
of CPU real-time threads, the grace period ends, but the callbacks don't
get invoked.  If the callbacks don't get invoked, the associated memory
doesn't get freed, so the system is still subject to OOM.

But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
moves the callback invocations to a kthread, which can be boosted easily.

Also add comments and properly synchronized all accesses to
rcu_cpu_kthread_task, as suggested by Lai Jiangshan.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05 23:16:54 -07:00
Theodore Ts'o
59802db074 ext4: remove obsolete mount options from ext4's documentation
The block reservation code from ext3 was removed long ago...

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-05-01 18:14:26 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
ae005cbed1 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
  ext4: fix a BUG in mb_mark_used during trim.
  ext4: unused variables cleanup in fs/ext4/extents.c
  ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep()
  ext4: add more tracepoints and use dev_t in the trace buffer
  ext4: don't kfree uninitialized s_group_info members
  ext4: add missing space in printk's in __ext4_grp_locked_error()
  ext4: add FITRIM to compat_ioctl.
  ext4: handle errors in ext4_clear_blocks()
  ext4: unify the ext4_handle_release_buffer() api
  ext4: handle errors in ext4_rename
  jbd2: add COW fields to struct jbd2_journal_handle
  jbd2: add the b_cow_tid field to journal_head struct
  ext4: Initialize fsync transaction ids in ext4_new_inode()
  ext4: Use single thread to perform DIO unwritten convertion
  ext4: optimize ext4_bio_write_page() when no extent conversion is needed
  ext4: skip orphan cleanup if fs has unknown ROCOMPAT features
  ext4: use the nblocks arg to ext4_truncate_restart_trans()
  ext4: fix missing iput of root inode for some mount error paths
  ext4: make FIEMAP and delayed allocation play well together
  ext4: suppress verbose debugging information if malloc-debug is off
  ...

Fi up conflicts in fs/ext4/super.c due to workqueue changes
2011-03-25 09:57:41 -07:00
Linus Torvalds
d39dd11c3e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: simplify iget & friends
  fs: pull inode->i_lock up out of writeback_single_inode
  fs: rename inode_lock to inode_hash_lock
  fs: move i_wb_list out from under inode_lock
  fs: move i_sb_list out from under inode_lock
  fs: remove inode_lock from iput_final and prune_icache
  fs: Lock the inode LRU list separately
  fs: factor inode disposal
  fs: protect inode->i_state with inode->i_lock
  autofs4: Do not potentially dereference NULL pointer returned by fget() in autofs_dev_ioctl_setpipefd()
  autofs4 - remove autofs4_lock
  autofs4 - fix d_manage() return on rcu-walk
  autofs4 - fix autofs4_expire_indirect() traversal
  autofs4 - fix dentry leak in autofs4_expire_direct()
  autofs4 - reinstate last used update on access
  vfs - check non-mountpoint dentry might block in __follow_mount_rcu()
2011-03-24 19:01:30 -07:00
Dave Chinner
f283c86afe fs: remove inode_lock from iput_final and prune_icache
Now that inode state changes are protected by the inode->i_lock and
the inode LRU manipulations by the inode_lru_lock, we can remove the
inode_lock from prune_icache and the initial part of iput_final().

instead of using the inode_lock to protect the inode during
iput_final, use the inode->i_lock instead. This protects the inode
against new references being taken while we change the inode state
to I_FREEING, as well as preventing prune_icache from grabbing the
inode while we are manipulating it. Hence we no longer need the
inode_lock in iput_final prior to setting I_FREEING on the inode.

For prune_icache, we no longer need the inode_lock to protect the
LRU list, and the inodes themselves are protected against freeing
races by the inode->i_lock. Hence we can lift the inode_lock from
prune_icache as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-24 21:16:32 -04:00
Linus Torvalds
5818fcc8bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: Use vmalloc rather than kmalloc for zlib workspace
  Squashfs: handle corruption of directory structure
  Squashfs: wrap squashfs_mount() definition
  Squashfs: xz_wrapper doesn't need to include squashfs_fs_i.h anymore
  Squashfs: Update documentation to include compression options
  Squashfs: Update Kconfig help text to include xz compression
  Squashfs: add compression options support to xz decompressor
  Squashfs: extend decompressor framework to handle compression options
2011-03-24 08:02:21 -07:00
Linus Torvalds
1b506cfb6a Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  exofs: deprecate the commands pending counter
  exofs: Write sbi->s_nextid as part of the Create command
  exofs: Add option to mount by osdname
  exofs: Override read-ahead to align on stripe_size
  exofs: simple fsync race fix
  exofs: Optimize read_4_write
  exofs: Trivial: fix some indentation and debug prints
  exofs: Remove redundant unlikely()
2011-03-24 07:57:38 -07:00
Stuart Swales
da23ef0549 adfs: add hexadecimal filetype suffix option
ADFS (FileCore) storage complies with the RISC OS filetype specification
(12 bits of file type information is stored in the file load address,
rather than using a file extension).  The existing driver largely ignores
this information and does not present it to the end user.

It is desirable that stored filetypes be made visible to the end user to
facilitate a precise copy of data and metadata from a hard disc (or image
thereof) into a RISC OS emulator (such as RPCEmu) or to a network share
which can be accessed by real Acorn systems.

This patch implements a per-mount filetype suffix option (use -o
ftsuffix=1) to present any filetype as a ,xyz hexadecimal suffix on each
file.  This type suffix is compatible with that used by RISC OS systems
that access network servers using NFS client software and by RPCemu's host
filing system.

Signed-off-by: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22 17:44:17 -07:00
Linus Torvalds
3155fe6df5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (23 commits)
  xfs: don't name variables "panic"
  xfs: factor agf counter updates into a helper
  xfs: clean up the xfs_alloc_compute_aligned calling convention
  xfs: kill support/debug.[ch]
  xfs: Convert remaining cmn_err() callers to new API
  xfs: convert the quota debug prints to new API
  xfs: rename xfs_cmn_err_fsblock_zero()
  xfs: convert xfs_fs_cmn_err to new error logging API
  xfs: kill xfs_fs_mount_cmn_err() macro
  xfs: kill xfs_fs_repair_cmn_err() macro
  xfs: convert xfs_cmn_err to xfs_alert_tag
  xfs: Convert xlog_warn to new logging interface
  xfs: Convert linux-2.6/ files to new logging interface
  xfs: introduce new logging API.
  xfs: zero proper structure size for geometry calls
  xfs: enable delaylog by default
  xfs: more sensible inode refcounting for ialloc
  xfs: stop using xfs_trans_iget in the RT allocator
  xfs: check if device support discard in xfs_ioc_trim()
  xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
  ...
2011-03-21 14:24:56 -07:00
Linus Torvalds
f539abece1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  fs: call security_d_instantiate in d_obtain_alias V2
  lose 'mounting_here' argument in ->d_manage()
  don't pass 'mounting_here' flag to follow_down()
  change the locking order for namespace_sem
  fix deadlock in pivot_root()
  vfs: split off vfsmount-related parts of vfs_kern_mount()
  Some fixes for pstore
  kill simple_set_mnt()
2011-03-18 10:51:11 -07:00
Linus Torvalds
8f627a8a88 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
  UBIFS: clean-up commentaries
  UBIFS: save 128KiB or more RAM
  UBIFS: allocate orphans scan buffer on demand
  UBIFS: allocate lpt dump buffer on demand
  UBIFS: allocate ltab checking buffer on demand
  UBIFS: allocate scanning buffer on demand
  UBIFS: allocate dump buffer on demand
  UBIFS: do not check data crc by default
  UBIFS: simplify UBIFS Kconfig menu
  UBIFS: print max. index node size
  UBIFS: handle allocation failures in UBIFS write path
  UBIFS: use max_write_size during recovery
  UBIFS: use max_write_size for write-buffers
  UBIFS: introduce write-buffer size field
  UBI: incorporate LEB offset information
  UBIFS: incorporate maximum write size
  UBI: provide LEB offset information
  UBI: incorporate maximum write size
  UBIFS: fix LEB number in printk
  UBIFS: restrict world-writable debugfs files
  ...
2011-03-18 10:50:27 -07:00
Linus Torvalds
e16b396ce3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
  doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
  Update cpuset info & webiste for cgroups
  dcdbas: force SMI to happen when expected
  arch/arm/Kconfig: remove one to many l's in the word.
  asm-generic/user.h: Fix spelling in comment
  drm: fix printk typo 'sracth'
  Remove one to many n's in a word
  Documentation/filesystems/romfs.txt: fixing link to genromfs
  drivers:scsi Change printk typo initate -> initiate
  serial, pch uart: Remove duplicate inclusion of linux/pci.h header
  fs/eventpoll.c: fix spelling
  mm: Fix out-of-date comments which refers non-existent functions
  drm: Fix printk typo 'failled'
  coh901318.c: Change initate to initiate.
  mbox-db5500.c Change initate to initiate.
  edac: correct i82975x error-info reported
  edac: correct i82975x mci initialisation
  edac: correct commented info
  fs: update comments to point correct document
  target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
  ...

Trivial conflict in fs/eventpoll.c (spelling vs addition)
2011-03-18 10:37:40 -07:00
Al Viro
1aed3e4204 lose 'mounting_here' argument in ->d_manage()
it's always false...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-18 10:01:59 -04:00
Linus Torvalds
179198373c Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
  RPC: killing RPC tasks races fixed
  xprt: remove redundant check
  SUNRPC: Convert struct rpc_xprt to use atomic_t counters
  SUNRPC: Ensure we always run the tk_callback before tk_action
  sunrpc: fix printk format warning
  xprt: remove redundant null check
  nfs: BKL is no longer needed, so remove the include
  NFS: Fix a warning in fs/nfs/idmap.c
  Cleanup: Factor out some cut-and-paste code.
  cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
  NFS: account direct-io into task io accounting
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  RPCRDMA: Fix FRMR registration/invalidate handling.
  RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
  NFSv4: Send unmapped uid/gids to the server when using auth_sys
  NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
  NFSv4: cleanup idmapper functions to take an nfs_server argument
  NFSv4: Send unmapped uid/gids to the server if the idmapper fails
  NFSv4: If the server sends us a numeric uid/gid then accept it
  NFSv4.1: reject zero layout with zeroed stripe unit
  ...
2011-03-17 17:40:00 -07:00
Linus Torvalds
054cfaacf8 Merge branch 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  vfs: bury ->get_sb()
  nfs: switch NFS from ->get_sb() to ->mount()
  nfs: stop mangling ->mnt_devname on NFS
  vfs: new superblock methods to override /proc/*/mount{s,info}
  nfs: nfs_do_{ref,sub}mount() superblock argument is redundant
  nfs: make nfs_path() work without vfsmount
  nfs: store devname at disconnected NFS roots
  nfs: propagate devname to nfs{,4}_get_root()
2011-03-16 19:09:57 -07:00
Al Viro
1a102ff925 vfs: bury ->get_sb()
This is an ex-parrot.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:48:06 -04:00
Boaz Harrosh
9ed9648431 exofs: Add option to mount by osdname
If /dev/osd* devices are shuffled because more devices
where added, and/or login order has changed. It is hard to
mount the FS you want.

Add an option to mount by osdname. osdname is any osd-device's
osdname as specified to the mkfs.exofs command when formatting
the osd-devices.
The new mount format is:
	OPT="osdname=$UUID0,pid=$PID,_netdev"
	mount -t exofs -o $OPT $DEV_OSD0 $MOUNTDIR

if "osdname=" is specified in options above $DEV_OSD0 is
ignored and can be empty.

Also while at it: Removed some old unused Opt_* enums.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-03-15 15:02:51 +02:00
Fred Isaman
80fe2b192d NFSv4.1: lseg documentation
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Artem Bityutskiy
2bcf002159 UBIFS: do not check data crc by default
Change the default UBIFS behavior WRT data CRC checking. Currently,
UBIFS checks data CRC when reading, which slows it down quite a bit,
and this is the default option. However, it looks like in average
user does not need this feature and would prefer faster read speed
over extra reliability. And this seems to be de-facto standard that
file-systems do not check data CRC every time they read from the
media.

Thus, make UBIFS default behavior so that it does not check data
CRC. This corresponds to the no_chk_data_crc mount option. Those users
who need extra protection can always enable it using the chk_data_crc
option.

Please, read more information about this feature here:
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_checksumming

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2011-03-11 10:52:07 +02:00
Phillip Lougher
4c1d204cde Squashfs: Update documentation to include compression options
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2011-02-28 18:35:36 +00:00
Christoph Hellwig
20ad9ea9be xfs: enable delaylog by default
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-02-22 20:33:25 -06:00
Lukas Czerner
6f9524e9e1 ext4: update ext4 documentation
Add documentation for mount options and ioctls to
Documentation/filesystem/ext4.txt, which has not been udpated for some
time.  Also add for ext4 sysfs tunables to the
Documentation/ABI/testing/sysfs-fs-ext4 file, and fix a few
typographical errors in that file.

https://bugzilla.kernel.org/show_bug.cgi?id=9423

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-02-21 20:16:21 -05:00
Alexander Kurz
ddf1228695 Documentation/filesystems/romfs.txt: fixing link to genromfs
Signed-off-by: Alexander Kurz <linux@kbdbabel.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 22:04:46 +01:00
Bart Van Assche
d3f70befd9 docs/sysfs: show() methods should use scnprintf().
Since snprintf() may return a value that exceeds its second argument,
show() methods should use scnprintf() instead of snprintf(). This patch
updates the example in the sysfs documentation accordingly.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-03 15:10:18 -08:00
Bart Van Assche
5480bcdd60 docs/sysfs: Update directory/kobject documentation.
Some time ago the way how sysfs stores a pointer to a kobject
corresponding to a directory was modified. This patch brings the
documentation again in sync with the implementation.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-03 15:10:18 -08:00
Anton Altaparmakov
af5eb745ef NTFS: Fix invalid pointer dereference in ntfs_mft_record_alloc().
In ntfs_mft_record_alloc() when mapping the new extent mft record with
map_extent_mft_record() we overwrite @m with the return value and on
error, we then try to use the old @m but that is no longer there as @m
now contains an error code instead so we crash when dereferencing the
error code as if it were a pointer.

The simple fix is to use a temporary variable to store the return value
thus preserving the original @m for later use.  This is a backport from
the commercial Tuxera-NTFS driver and is well tested...

Thanks go to Julia Lawall for pointing this out (whilst I had fixed it
in the commercial driver I had failed to fix it in the Linux kernel).

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-31 12:58:11 +10:00
Christoph Hellwig
2fe17c1075 fallocate should be a file operation
Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes.  On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions.   Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-17 02:25:31 -05:00
Linus Torvalds
f8206b925f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
  sanitize vfsmount refcounting changes
  fix old umount_tree() breakage
  autofs4: Merge the remaining dentry ops tables
  Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
  Allow d_manage() to be used in RCU-walk mode
  Remove a further kludge from __do_follow_link()
  autofs4: Bump version
  autofs4: Add v4 pseudo direct mount support
  autofs4: Fix wait validation
  autofs4: Clean up autofs4_free_ino()
  autofs4: Clean up dentry operations
  autofs4: Clean up inode operations
  autofs4: Remove unused code
  autofs4: Add d_manage() dentry operation
  autofs4: Add d_automount() dentry operation
  Remove the automount through follow_link() kludge code from pathwalk
  CIFS: Use d_automount() rather than abusing follow_link()
  NFS: Use d_automount() rather than abusing follow_link()
  AFS: Use d_automount() rather than abusing follow_link()
  Add an AT_NO_AUTOMOUNT flag to suppress terminal automount
  ...
2011-01-16 11:31:50 -08:00
David Howells
ea5b778a8b Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself.  follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer.  The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2.  One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used.  The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:48 -05:00
David Howells
ab90911ff9 Allow d_manage() to be used in RCU-walk mode
Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
d_manage() directly.  d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:47 -05:00
David Howells
cc53ce53c8 Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:31 -05:00
David Howells
9875cf8064 Add a dentry op to handle automounting rather than abusing follow_link()
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The ->d_automount() dentry operation:

	struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The ->d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.

Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:05:03 -05:00
Linus Torvalds
18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
Linus Torvalds
db9effe99a Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
  fs: fix do_last error case when need_reval_dot
  nfs: add missing rcu-walk check
  fs: hlist UP debug fixup
  fs: fix dropping of rcu-walk from force_reval_path
  fs: force_reval_path drop rcu-walk before d_invalidate
  fs: small rcu-walk documentation fixes

Fixed up trivial conflicts in Documentation/filesystems/porting
2011-01-13 20:14:13 -08:00
Nick Piggin
a82416da83 fs: small rcu-walk documentation fixes
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 02:26:53 +00:00
Mandeep Singh Baines
dabb16f639 oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down
We'd like to be able to oom_score_adj a process up/down as it
enters/leaves the foreground.  Currently, it is not possible to oom_adj
down without CAP_SYS_RESOURCE.  This patch allows a task to decrease its
oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to
or its inherited value at fork.  Assuming the thread that has forked it
has oom_score_adj of 0, each process could decrease it back from 0 upon
activation unless a CAP_SYS_RESOURCE thread elevated it to something
higher.

Alternative considered:

* a setuid binary
* a daemon with CAP_SYS_RESOURCE

Since you don't wan't all processes to be able to reduce their oom_adj, a
setuid or daemon implementation would be complex.  The alternatives also
have much higher overhead.

This patch updated from original patch based on feedback from David
Rientjes.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:35 -08:00
Nikanth Karthikesan
2d90508f63 mm: smaps: export mlock information
Currently there is no way to find whether a process has locked its pages
in memory or not.  And which of the memory regions are locked in memory.

Add a new field "Locked" to export this information via the smaps file.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:33 -08:00
Linus Torvalds
b2034d474b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
  fs: add documentation on fallocate hole punching
  Gfs2: fail if we try to use hole punch
  Btrfs: fail if we try to use hole punch
  Ext4: fail if we try to use hole punch
  Ocfs2: handle hole punching via fallocate properly
  XFS: handle hole punching via fallocate properly
  fs: add hole punching to fallocate
  vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
  fix signedness mess in rw_verify_area() on 64bit architectures
  fs: fix kernel-doc for dcache::prepend_path
  fs: fix kernel-doc for dcache::d_validate
  sanitize ecryptfs ->mount()
  switch afs
  move internal-only parts of ncpfs headers to fs/ncpfs
  switch ncpfs
  switch 9p
  pass default dentry_operations to mount_pseudo()
  switch hostfs
  switch affs
  switch configfs
  ...
2011-01-13 10:27:28 -08:00
Josef Bacik
924241575a fs: add documentation on fallocate hole punching
This patch simply adds documentation on how to handle the hole punching mode of
fallocate for any filesystem wishing to use it.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:19:03 -05:00
Anton Altaparmakov
2818ef50c4 NTFS: writev() fix and maintenance/contact details update
Fix writev() to not keep writing the first segment over and over again
instead of moving onto subsequent segments and update the NTFS entry in
MAINTAINERS to reflect that Tuxera Inc. now supports the NTFS driver.

Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-12 08:35:53 -08:00
J. Bruce Fields
ec26fba40f Documentation: fl_mylease no longer exists
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:09 -05:00
Linus Torvalds
56b85f32d5 Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (36 commits)
  serial: apbuart: Fixup apbuart_console_init()
  TTY: Add tty ioctl to figure device node of the system console.
  tty: add 'active' sysfs attribute to tty0 and console device
  drivers: serial: apbuart: Handle OF failures gracefully
  Serial: Avoid unbalanced IRQ wake disable during resume
  tty: fix typos/errors in tty_driver.h comments
  pch_uart : fix warnings for 64bit compile
  8250: fix uninitialized FIFOs
  ip2: fix compiler warning on ip2main_pci_tbl
  specialix: fix compiler warning on specialix_pci_tbl
  rocket: fix compiler warning on rocket_pci_ids
  8250: add a UPIO_DWAPB32 for 32 bit accesses
  8250: use container_of() instead of casting
  serial: omap-serial: Add support for kernel debugger
  serial: fix pch_uart kconfig & build
  drivers: char: hvc: add arm JTAG DCC console support
  RS485 documentation: add 16C950 UART description
  serial: ifx6x60: fix memory leak
  serial: ifx6x60: free IRQ on error
  Serial: EG20T: add PCH_UART driver
  ...

Fixed up conflicts in drivers/serial/apbuart.c with evil merge that
makes the code look fairly sane (unlike either side).
2011-01-07 14:39:20 -08:00
Nick Piggin
b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin
34286d6662 fs: rcu-walk aware d_revalidate method
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin
31e6b01f41 fs: rcu-walk for path lookup
Perform common cases of path lookups without any stores or locking in the
ancestor dentry elements. This is called rcu-walk, as opposed to the current
algorithm which is a refcount based walk, or ref-walk.

This results in far fewer atomic operations on every path element,
significantly improving path lookup performance. It also avoids cacheline
bouncing on common dentries, significantly improving scalability.

The overall design is like this:
* LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
* Take the RCU lock for the entire path walk, starting with the acquiring
  of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
  not required for dentry persistence.
* synchronize_rcu is called when unregistering a filesystem, so we can
  access d_ops and i_ops during rcu-walk.
* Similarly take the vfsmount lock for the entire path walk. So now mnt
  refcounts are not required for persistence. Also we are free to perform mount
  lookups, and to assume dentry mount points and mount roots are stable up and
  down the path.
* Have a per-dentry seqlock to protect the dentry name, parent, and inode,
  so we can load this tuple atomically, and also check whether any of its
  members have changed.
* Dentry lookups (based on parent, candidate string tuple) recheck the parent
  sequence after the child is found in case anything changed in the parent
  during the path walk.
* inode is also RCU protected so we can load d_inode and use the inode for
  limited things.
* i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
* i_op can be loaded.

When we reach the destination dentry, we lock it, recheck lookup sequence,
and increment its refcount and mountpoint refcount. RCU and vfsmount locks
are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
not match, we can not drop rcu-walk gracefully at the current point in the
lokup, so instead return -ECHILD (for want of a better errno). This signals the
path walking code to re-do the entire lookup with a ref-walk.

Aside from the final dentry, there are other situations that may be encounted
where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
a reference on the last good dentry) and continue with a ref-walk. Again, if
we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
using ref-walk. But it is very important that we can continue with ref-walk
for most cases, particularly to avoid the overhead of double lookups, and to
gain the scalability advantages on common path elements (like cwd and root).

The cases where rcu-walk cannot continue are:
* NULL dentry (ie. any uncached path element)
* parent with d_inode->i_op->permission or ACLs
* dentries with d_revalidate
* Following links

In future patches, permission checks and d_revalidate become rcu-walk aware. It
may be possible eventually to make following links rcu-walk aware.

Uncached path elements will always require dropping to ref-walk mode, at the
very least because i_mutex needs to be grabbed, and objects allocated.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:27 +11:00
Nick Piggin
fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin
b5c84bf6f6 fs: dcache remove dcache_lock
dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:23 +11:00
Nick Piggin
b1e6a015a5 fs: change d_hash for rcu-walk
Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:20 +11:00
Nick Piggin
621e155a35 fs: change d_compare for rcu-walk
Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:19 +11:00
Nick Piggin
fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Christoph Hellwig
8a87694ed1 remove trim_fs method from Documentation/filesystems/Locking
The ->trim_fs has been removed meanwhile, so remove it from the documentation
as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-04 11:01:09 -08:00
Christoph Hellwig
b83be6f20a update Documentation/filesystems/Locking
Mostly inspired by all the recent BKL removal changes, but a lot of older
updates also weren't properly recorded.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-30 10:00:50 -08:00
Linus Torvalds
38971ce2fa Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix panic after nfs_umount()
  nfs: remove extraneous and problematic calls to nfs_clear_request
  nfs: kernel should return EPROTONOSUPPORT when not support NFSv4
  NFS: Fix fcntl F_GETLK not reporting some conflicts
  nfs: Discard ACL cache on mode update
  NFS: Readdir cleanups
  NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found
  NFS: Fix a memory leak in nfs_readdir
  Call the filesystem back whenever a page is removed from the page cache
  NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
2010-12-14 08:51:12 -08:00
Andrew Morton
4fe65cab84 Documentation/filesystems/vfs.txt: fix ->repeasepage() description
->releasepage() does not remove the page from the mapping.

Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02 14:51:15 -08:00
Linus Torvalds
6072d13c42 Call the filesystem back whenever a page is removed from the page cache
NFS needs to be able to release objects that are stored in the page
cache once the page itself is no longer visible from the page cache.

This patch adds a callback to the address space operations that allows
filesystems to perform page cleanups once the page has been removed
from the page cache.

Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
[trondmy: cover the cases of invalidate_inode_pages2() and
          truncate_inode_pages()]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-02 09:55:21 -05:00
Dan Carpenter
09c9feb946 Documentation: make configfs example code simpler, clearer
If "p" is NULL then it will cause an oops when we pass it to
simple_strtoul().  In this case "p" can not be NULL so I removed the
check.  I also changed the check a little to make it more explicit that
we are testing whether p points to the NUL char.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-18 15:00:46 -08:00
Jiri Slaby
23308ba54d console: add /proc/consoles
It allows users to see what consoles are currently known to the system
and with what flags.

It is based on Werner's patch, the part about traversing fds was
removed, the code was moved to kernel/printk.c, where consoles are
handled and it makes more sense to me.

Signed-off-by: Jiri Slaby <jslaby@suse.cz> [cleanups]
Signed-off-by: "Dr. Werner Fink" <werner@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-16 12:50:17 -08:00
Christoph Hellwig
5d0af85cd0 xfs: remove experimental tag from the delaylog option
We promised to do this for 2.6.37, and the code looks stable enough to
keep that promise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-11-10 12:00:47 -06:00
Christoph Hellwig
bb8430a2c8 locks: remove fl_copy_lock lock_manager operation
This one was only used for a nasty hack in nfsd, which has recently
been removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-31 06:35:15 -07:00
Linus Torvalds
f063a0c0c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (841 commits)
  Staging: brcm80211: fix usage of roundup in structures
  Staging: bcm: fix up network device reference counting
  Staging: keucr: fix up US_ macro change
  staging: brcm80211: brcmfmac: Removed codeversion from firmware filenames.
  staging: brcm80211: Remove unnecessary header files.
  staging: brcm80211: Remove unnecessary includes from bcmutils.c
  staging: brcm80211: Removed unnecessary pktsetprio() function.
  Staging: brcm80211: remove typedefs.h
  Staging: brcm80211: remove uintptr typedef usage
  Staging: hv: remove struct vmbus_channel_interface
  Staging: hv: remove Open from struct vmbus_channel_interface
  Staging: hv: storvsc: call vmbus_open directly
  Staging: hv: netvsc: call vmbus_open directly
  Staging: hv: channel: export vmbus_open to modules
  Staging: hv: remove Close from struct vmbus_channel_interface
  Staging: hv: netvsc: call vmbus_close directly
  Staging: hv: storvsc: call vmbus_close directly
  Staging: hv: channel: export vmbus_close to modules
  Staging: hv: remove SendPacket from struct vmbus_channel_interface
  Staging: hv: storvsc: call vmbus_sendpacket directly
  ...

Fix up conflicts in
	drivers/staging/cx25821/cx25821-audio-upstream.c
	drivers/staging/cx25821/cx25821-audio.h
due to warring whitespace cleanups (neither of which were all that great)
2010-10-28 12:13:00 -07:00
Greg Kroah-Hartman
e4c5bf8e3d Merge 'staging-next' to Linus's tree
This merges the staging-next tree to Linus's tree and resolves
some conflicts that were present due to changes in other trees that were
affected by files here.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-28 09:44:56 -07:00
Aneesh Kumar K.V
76381a42e4 fs/9p: Add access = client option to opt in acl evaluation.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:46 -05:00
Theodore Ts'o
a107e5a3a4 Merge branch 'next' into upstream-merge
Conflicts:
	fs/ext4/inode.c
	fs/ext4/mballoc.c
	include/trace/events/ext4.h
2010-10-27 23:44:47 -04:00
Lukas Czerner
bfff68738f ext4: add support for lazy inode table initialization
When the lazy_itable_init extended option is passed to mke2fs, it
considerably speeds up filesystem creation because inode tables are
not zeroed out.  The fact that parts of the inode table are
uninitialized is not a problem so long as the block group descriptors,
which contain information regarding how much of the inode table has
been initialized, has not been corrupted However, if the block group
checksums are not valid, e2fsck must scan the entire inode table, and
the the old, uninitialized data could potentially cause e2fsck to
report false problems.

Hence, it is important for the inode tables to be initialized as soon
as possble.  This commit adds this feature so that mke2fs can safely
use the lazy inode table initialization feature to speed up formatting
file systems.

This is done via a new new kernel thread called ext4lazyinit, which is
created on demand and destroyed, when it is no longer needed.  There
is only one thread for all ext4 filesystems in the system. When the
first filesystem with inititable mount option is mounted, ext4lazyinit
thread is created, then the filesystem can register its request in the
request list.

This thread then walks through the list of requests picking up
scheduled requests and invoking ext4_init_inode_table(). Next schedule
time for the request is computed by multiplying the time it took to
zero out last inode table with wait multiplier, which can be set with
the (init_itable=n) mount option (default is 10).  We are doing
this so we do not take the whole I/O bandwidth. When the thread is no
longer necessary (request list is empty) it frees the appropriate
structures and exits (and can be created later later by another
filesystem).

We do not disturb regular inode allocations in any way, it just do not
care whether the inode table is, or is not zeroed. But when zeroing, we
have to skip used inodes, obviously. Also we should prevent new inode
allocations from the group, while zeroing is on the way. For that we
take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
in the ext4_claim_inode, so when we are unlucky and allocator hits the
group which is currently being zeroed, it just has to wait.

This can be suppresed using the mount option no_init_itable.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-10-27 21:30:05 -04:00
Nikanth Karthikesan
03f890f8c2 /proc/pid/pagemap: document in Documentation/filesystems/proc.txt
Document /proc/pid/pagemap in Documentation/filesystems/proc.txt

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: Richard Guenther <rguenther@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:13 -07:00
Nikanth Karthikesan
b40d4f84be /proc/pid/smaps: export amount of anonymous memory in a mapping
Export the number of anonymous pages in a mapping via smaps.

Even the private pages in a mapping backed by a file, would be marked as
anonymous, when they are modified. Export this information to user-space via
smaps.

Exporting this count will help gdb to make a better decision on which
areas need to be dumped in its coredump; and should be useful to others
studying the memory usage of a process.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:13 -07:00
Linus Torvalds
426e1f5cec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  split invalidate_inodes()
  fs: skip I_FREEING inodes in writeback_sb_inodes
  fs: fold invalidate_list into invalidate_inodes
  fs: do not drop inode_lock in dispose_list
  fs: inode split IO and LRU lists
  fs: switch bdev inode bdi's correctly
  fs: fix buffer invalidation in invalidate_list
  fsnotify: use dget_parent
  smbfs: use dget_parent
  exportfs: use dget_parent
  fs: use RCU read side protection in d_validate
  fs: clean up dentry lru modification
  fs: split __shrink_dcache_sb
  fs: improve DCACHE_REFERENCED usage
  fs: use percpu counter for nr_dentry and nr_dentry_unused
  fs: simplify __d_free
  fs: take dcache_lock inside __d_path
  fs: do not assign default i_ino in new_inode
  fs: introduce a per-cpu last_ino allocator
  new helper: ihold()
  ...
2010-10-26 17:58:44 -07:00
Linus Torvalds
18a043f941 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: rename nfs.upcall -> nfs.idmap
  NFS: Fix a compile issue in nfs_root
2010-10-26 17:24:28 -07:00
Matt Mackall
0f4d208f19 Documentation/filesystems/proc.txt: improve smaps field documentation
Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:05 -07:00
Bryan Schumaker
eb1c86b8b5 NFS: rename nfs.upcall -> nfs.idmap
This patch renames the idmapper upcall program from nfs.upcall to nfs.idmap in
the NFS documentation.  This is because the program has been renamed in the
nfs-utils source.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:57:10 -04:00
Linus Torvalds
a4dd8dce14 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  net/sunrpc: Use static const char arrays
  nfs4: fix channel attribute sanity-checks
  NFSv4.1: Use more sensible names for 'initialize_mountpoint'
  NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
  NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
  NFS: client needs to maintain list of inodes with active layouts
  NFS: create and destroy inode's layout cache
  NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
  NFSv4.1: pnfs: full mount/umount infrastructure
  NFS: set layout driver
  NFS: ask for layouttypes during v4 fsinfo call
  NFS: change stateid to be a union
  NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
  SUNRPC: define xdr_decode_opaque_fixed
  NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26 09:52:09 -07:00
Christoph Hellwig
e1455d1bdc update block_device_operations documentation
Updated Documentation/filesystems/Locking to match the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2010-10-25 21:18:22 -04:00
Valerie Aurora
d9d1dc802f Documentation: Fix trivial typo in filesystems/sharedsubtree.txt
Documentation: Fix trivial typo in filesystems/sharedsubtree.txt

This typo is easy to ignore unless you have spent a great deal of time
thinking about how to eliminate duplicate dentries in unions.

Signed-off-by: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:18:21 -04:00
Linus Torvalds
74eb94b218 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
  SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
  nfs: fix unchecked value
  Ask for time_delta during fsinfo probe
  Revalidate caches on lock
  SUNRPC: After calling xprt_release(), we must restart from call_reserve
  NFSv4: Fix up the 'dircount' hint in encode_readdir
  NFSv4: Clean up nfs4_decode_dirent
  NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
  NFSv4: Fix a regression in decode_getfattr
  NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
  NFS: Ensure we check all allocation return values in new readdir code
  NFS: Readdir plus in v4
  NFS: introduce generic decode_getattr function
  NFS: check xdr_decode for errors
  NFS: nfs_readdir_filler catch all errors
  NFS: readdir with vmapped pages
  NFS: remove page size checking code
  NFS: decode_dirent should use an xdr_stream
  SUNRPC: Add a helper function xdr_inline_peek
  NFS: remove readdir plus limit
  ...
2010-10-25 13:48:29 -07:00
Fred Isaman
02c35fca7c NFSv4.1: pnfs: full mount/umount infrastructure
Allow a module implementing a layout type to register, and
have its mount/umount routines called for filesystems that
the server declares support it.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Linus Torvalds
6c2754c28f Revert "tty: Add a new file /proc/tty/consoles"
This reverts commit f4a3e0bceb.  Jiri
Sladby points out that the tty structure we're using may already be
gone, and Al Viro doesn't hold back in complaining about the random
loading of 'filp->private_data' which doesn't have to be a pointer at
all, nor does checking the magic field for TTY_MAGIC prove anything.

Belated review by Al:

 "a) global variable depending on stdin of the last opener? Affecting
     output of read(2)? Really?

  b) iterator is broken; list should be locked in ->start(), unlocked in
     ->stop() and *NOT* unlocked/relocked in ->next()

  c) ->show() ought to do nothing in case of ->device == NULL, instead
     of skipping those in ->next()/->start()

  d) regardless of the merits of the bright idea about asterisk at that
     line in output *and* regardless of (a), the implementation is not
     only atrociously ugly, it's actually very likely to be a roothole.
     Verifying that Cthulhu knows what number happens to be address of a
     tty_struct by blindly dereferencing memory at that address...
     Ouch.

  Please revert that crap."

And Christoph pipes in and NAK's the approach of walking fd tables etc
too.  So it's pretty unanimous.

Noticed-by: Jri Slaby <jslaby@suse.cz>
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Werner Fink <werner@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-23 08:14:12 -07:00
Linus Torvalds
73ecf3a6e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (49 commits)
  serial8250: ratelimit "too much work" error
  serial: bfin_sport_uart: speed up sport RX sample rate to be 3% faster
  serial: abstraction for 8250 legacy ports
  serial/imx: check that the buffer is non-empty before sending it out
  serial: mfd: add more baud rates support
  jsm: Remove the uart port on errors
  Alchemy: Add UART PM methods.
  8250: allow platforms to override PM hook.
  altera_uart: Don't use plain integer as NULL pointer
  altera_uart: Fix missing prototype for registering an early console
  altera_uart: Fixup type usage of port flags
  altera_uart: Make it possible to use Altera UART and 8250 ports together
  altera_uart: Add support for different address strides
  altera_uart: Add support for getting mapbase and IRQ from resources
  altera_uart: Add support for polling mode (IRQ-less)
  serial: Factor out uart_poll_timeout() from 8250 driver
  serial: mark the 8250 driver as maintained
  serial: 8250: Don't delay after transmitter is ready.
  tty: MAINTAINERS: add drivers/serial/jsm/ as maintained driver
  vcs: invoke the vt update callback when /dev/vcs* is written to
  ...
2010-10-22 19:59:04 -07:00
Dr. Werner Fink
f4a3e0bceb tty: Add a new file /proc/tty/consoles
Add a new file /proc/tty/consoles to be able to determine the registered
system console lines.  If the reading process holds /dev/console open at
the regular standard input stream the active device will be marked by an
asterisk.  Show possible operations and also decode the used flags of
the listed console lines.

Signed-off-by: Werner Fink <werner@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-22 10:20:05 -07:00
Tristan Ye
7bdb0d18bf ocfs2: Add a mount option "coherency=*" to handle cluster coherency for O_DIRECT writes.
Currently, the default behavior of O_DIRECT writes was allowing
concurrent writing among nodes to the same file, with no cluster
coherency guaranteed (no EX lock held).  This can leave stale data in
the cache for buffered reads on other nodes.

The new mount option introduce a chance to choose two different
behaviors for O_DIRECT writes:

    * coherency=full, as the default value, will disallow
                      concurrent O_DIRECT writes by taking
                      EX locks.

    * coherency=buffered, allow concurrent O_DIRECT writes
                          without EX lock among nodes, which
                          gains high performance at risk of
                          getting stale data on other nodes.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-10-11 14:14:55 -07:00
Bryan Schumaker
955a857e06 NFS: new idmapper
This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names.  The old
idmapper was single threaded, which prevented more than one request from running
at a single time.  This means that a user would have to wait for an upcall to
finish before accessing a cached result.

The upcall result is stored on a keyring of type id_resolver.  See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-07 18:48:49 -04:00
Arnd Bergmann
2116b7a473 smbfs: move to drivers/staging
smbfs has been scheduled for removal in 2.6.27, so
maybe we can now move it to drivers/staging on the
way out.

smbfs still uses the big kernel lock and nobody
is going to fix that, so we should be getting
rid of it soon.

This removes the 32 bit compat mount and ioctl
handling code, which is implemented in common fs
code, and moves all smbfs related files into
drivers/staging/smbfs.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-10-05 09:08:21 -07:00
Chuck Lever
306a075362 NFS: Allow NFSROOT debugging messages to be enabled dynamically
As a convenience, introduce a kernel command line option to enable
NFSROOT debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-09-17 10:54:37 -04:00
Arnd Bergmann
b19dd42faf bkl: Remove locked .ioctl file operation
The last user is gone, so we can safely remove this

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-08-14 00:24:24 +02:00
Linus Torvalds
062e27ec1b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: fix checkpatch.pl warnings
  Squashfs: fix filename typo
  Squashfs: update Kconfig and documentation for LZO
  Squashfs: fix block size use in LZO decompressor
  Squashfs: Add LZO compression support
  squashfs: fix filename in header comment
  Squashfs: Make XATTR config name consistent with other file systems
  squashfs: fix compiler inline warning
2010-08-11 09:20:13 -07:00
Linus Torvalds
5f248c9c25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
  no need for list_for_each_entry_safe()/resetting with superblock list
  Fix sget() race with failing mount
  vfs: don't hold s_umount over close_bdev_exclusive() call
  sysv: do not mark superblock dirty on remount
  sysv: do not mark superblock dirty on mount
  btrfs: remove junk sb_dirt change
  BFS: clean up the superblock usage
  AFFS: wait for sb synchronization when needed
  AFFS: clean up dirty flag usage
  cifs: truncate fallout
  mbcache: fix shrinker function return value
  mbcache: Remove unused features
  add f_flags to struct statfs(64)
  pass a struct path to vfs_statfs
  update VFS documentation for method changes.
  All filesystems that need invalidate_inode_buffers() are doing that explicitly
  convert remaining ->clear_inode() to ->evict_inode()
  Make ->drop_inode() just return whether inode needs to be dropped
  fs/inode.c:clear_inode() is gone
  fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
  ...

Fix up trivial conflicts in fs/nilfs2/super.c
2010-08-10 11:26:52 -07:00
David Rientjes
51b1bd2ace oom: deprecate oom_adj tunable
/proc/pid/oom_adj is now deprecated so that that it may eventually be
removed.  The target date for removal is August 2012.

A warning will be printed to the kernel log if a task attempts to use this
interface.  Future warning will be suppressed until the kernel is rebooted
to prevent spamming the kernel log.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:02 -07:00
David Rientjes
a63d83f427 oom: badness heuristic rewrite
This a complete rewrite of the oom killer's badness() heuristic which is
used to determine which task to kill in oom conditions.  The goal is to
make it as simple and predictable as possible so the results are better
understood and we end up killing the task which will lead to the most
memory freeing while still respecting the fine-tuning from userspace.

Instead of basing the heuristic on mm->total_vm for each task, the task's
rss and swap space is used instead.  This is a better indication of the
amount of memory that will be freeable if the oom killed task is chosen
and subsequently exits.  This helps specifically in cases where KDE or
GNOME is chosen for oom kill on desktop systems instead of a memory
hogging task.

The baseline for the heuristic is a proportion of memory that each task is
currently using in memory plus swap compared to the amount of "allowable"
memory.  "Allowable," in this sense, means the system-wide resources for
unconstrained oom conditions, the set of mempolicy nodes, the mems
attached to current's cpuset, or a memory controller's limit.  The
proportion is given on a scale of 0 (never kill) to 1000 (always kill),
roughly meaning that if a task has a badness() score of 500 that the task
consumes approximately 50% of allowable memory resident in RAM or in swap
space.

The proportion is always relative to the amount of "allowable" memory and
not the total amount of RAM systemwide so that mempolicies and cpusets may
operate in isolation; they shall not need to know the true size of the
machine on which they are running if they are bound to a specific set of
nodes or mems, respectively.

Root tasks are given 3% extra memory just like __vm_enough_memory()
provides in LSMs.  In the event of two tasks consuming similar amounts of
memory, it is generally better to save root's task.

Because of the change in the badness() heuristic's baseline, it is also
necessary to introduce a new user interface to tune it.  It's not possible
to redefine the meaning of /proc/pid/oom_adj with a new scale since the
ABI cannot be changed for backward compatability.  Instead, a new tunable,
/proc/pid/oom_score_adj, is added that ranges from -1000 to +1000.  It may
be used to polarize the heuristic such that certain tasks are never
considered for oom kill while others may always be considered.  The value
is added directly into the badness() score so a value of -500, for
example, means to discount 50% of its memory consumption in comparison to
other tasks either on the system, bound to the mempolicy, in the cpuset,
or sharing the same memory controller.

/proc/pid/oom_adj is changed so that its meaning is rescaled into the
units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
these per-task tunables will rescale the value of the other to an
equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
a bitshift on the badness score, it now shares the same linear growth as
/proc/pid/oom_score_adj but with different granularity.  This is required
so the ABI is not broken with userspace applications and allows oom_adj to
be deprecated for future removal.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:02 -07:00
Al Viro
336fb3b97b update VFS documentation for method changes.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:48:40 -04:00
Christoph Hellwig
1e23173509 update documentation for the new truncate sequence
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-08-09 16:47:40 -04:00
Linus Torvalds
a57f9a3e81 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (45 commits)
  nilfs2: reject filesystem with unsupported block size
  nilfs2: avoid rec_len overflow with 64KB block size
  nilfs2: simplify nilfs_get_page function
  nilfs2: reject incompatible filesystem
  nilfs2: add feature set fields to super block
  nilfs2: clarify byte offset in super block format
  nilfs2: apply read-ahead for nilfs_btree_lookup_contig
  nilfs2: introduce check flag to btree node buffer
  nilfs2: add btree get block function with readahead option
  nilfs2: add read ahead mode to nilfs_btnode_submit_block
  nilfs2: fix buffer head leak in nilfs_btnode_submit_block
  nilfs2: eliminate inline keywords in btree implementation
  nilfs2: get maximum number of child nodes from bmap object
  nilfs2: reduce repetitive calculation of max number of child nodes
  nilfs2: optimize calculation of min/max number of btree node children
  nilfs2: remove redundant pointer checks in bmap lookup functions
  nilfs2: get rid of nilfs_bmap_union
  nilfs2: unify bmap set_target_v operations
  nilfs2: get rid of nilfs_btree uses
  nilfs2: get rid of nilfs_direct uses
  ...
2010-08-07 13:10:55 -07:00
Linus Torvalds
3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
Linus Torvalds
1cfd2bda8c Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (30 commits)
  PCI: update for owner removal from struct device_attribute
  PCI: Fix warnings when CONFIG_DMI unset
  PCI: Do not run NVidia quirks related to MSI with MSI disabled
  x86/PCI: use for_each_pci_dev()
  PCI: use for_each_pci_dev()
  PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
  PCI: export SMBIOS provided firmware instance and label to sysfs
  PCI: Allow read/write access to sysfs I/O port resources
  x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN
  PCI: remove unused HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_{SIZE|BOUNDARY}
  PCI: disable mmio during bar sizing
  PCI: MSI: Remove unsafe and unnecessary hardware access
  PCI: Default PCIe ASPM control to on and require !EMBEDDED to disable
  PCI: kernel oops on access to pci proc file while hot-removal
  PCI: pci-sysfs: remove casts from void*
  ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe
  PCI hotplug: make sure child bridges are enabled at hotplug time
  PCI hotplug: shpchp: Removed check for hotplug of display devices
  PCI hotplug: pciehp: Fixed return value sign for pciehp_unconfigure_device
  PCI: Don't enable aspm before drivers have had a chance to veto it
  ...
2010-08-06 11:44:36 -07:00
Phillip Lougher
4b676d2dbe Squashfs: update Kconfig and documentation for LZO
Update compression types supported and add some help text for
the LZO Kconfig option.

Also add missing "default n" line and make some trivial whitespace
cleanups too.

Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-08-05 23:42:54 +01:00
Ira Weiny
a530703271 sysfs: Fix one more signature discrepancy between sysfs implementation and docs.
Signed-off-by: Ira Weiny <weiny2@llnl.gov>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-05 13:53:34 -07:00
Bart Van Assche
30a69000a4 sysfs: fix discrepancies between implementation and documentation
Fix all discrepancies I know of between the sysfs implementation and its
documentation.

Signed-off-by: Bart Van Assche <bart.vanassche@gmail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-05 13:53:34 -07:00
Linus Torvalds
3cfc2c42c1 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits)
  Documentation: update broken web addresses.
  fix comment typo "choosed" -> "chosen"
  hostap:hostap_hw.c Fix typo in comment
  Fix spelling contorller -> controller in comments
  Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault
  fs/Kconfig: Fix typo Userpace -> Userspace
  Removing dead MACH_U300_BS26
  drivers/infiniband: Remove unnecessary casts of private_data
  fs/ocfs2: Remove unnecessary casts of private_data
  libfc: use ARRAY_SIZE
  scsi: bfa: use ARRAY_SIZE
  drm: i915: use ARRAY_SIZE
  drm: drm_edid: use ARRAY_SIZE
  synclink: use ARRAY_SIZE
  block: cciss: use ARRAY_SIZE
  comment typo fixes: charater => character
  fix comment typos concerning "challenge"
  arm: plat-spear: fix typo in kerneldoc
  reiserfs: typo comment fix
  update email address
  ...
2010-08-04 15:31:02 -07:00
Linus Torvalds
6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
Justin P. Mattock
0ea6e61122 Documentation: update broken web addresses.
Below you will find an updated version from the original series bunching all patches into one big patch
updating broken web addresses that are located in Documentation/*
Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
Now there are also some addresses pointing to .spec files some are located, but some(after searching
on the companies site)where still no where to be found. In this case I just changed the address
to the company site this way the users can contact the company and they can locate them for the users.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Thomas Weber <weber@corscience.de>
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Cc: Paulo Marques <pmarques@grupopie.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-08-04 15:21:40 +02:00
Alex Williamson
8633328be2 PCI: Allow read/write access to sysfs I/O port resources
PCI sysfs resource files currently only allow mmap'ing.  On x86 this
works fine for memory backed BARs, but doesn't work at all for I/O
port backed BARs.  Add read/write to I/O port PCI sysfs resource
files to allow userspace access to these device regions.

Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:32:08 -07:00
Christoph Hellwig
a64afb057b xfs: remove obsolete osyncisosync mount option
Since Linux 2.6.33 the kernel has support for real O_SYNC, which made
the osyncisosync option a no-op.  Warn the users about this and remove
the mount flag for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2010-07-26 13:16:51 -05:00
Ryusuke Konishi
802d317754 nilfs2: add nodiscard mount option
Nilfs has "discard" mount option which issues discard/TRIM commands to
underlying block device, but it lacks a complementary option and has
no way to disable the feature through remount.

This adds "nodiscard" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Ryusuke Konishi
773bc4f3b6 nilfs2: add barrier mount option
Nilfs enables write barriers by default and has "nobarrier" mount
option to disable this feature.  But it lacks the complementary option
and has no way to re-enable the feature on remount.

This adds "barrier" option to resolve this imbalance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-07-23 10:02:12 +09:00
Tejun Heo
8b8edefa2f fscache: convert object to use workqueue instead of slow-work
Make fscache object state transition callbacks use workqueue instead
of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
is created.  get/put callbacks are renamed and modified to take
@object and called directly from the enqueue wrapper and the work
function.  While at it, make all open coded instances of get/put to
use fscache_get/put_object().

* Unbound workqueue is used.

* work_busy() output is printed instead of slow-work flags in object
  debugging outputs.  They mean basically the same thing bit-for-bit.

* sysctl fscache.object_max_active added to control concurrency.  The
  default value is nr_cpus clamped between 4 and
  WQ_UNBOUND_MAX_ACTIVE.

* slow_work_sleep_till_thread_needed() is replaced with fscache
  private implementation fscache_object_sleep_till_congested() which
  waits on fscache_object_wq congestion.

* debugfs support is dropped for now.  Tracing API based debug
  facility is planned to be added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Howells <dhowells@redhat.com>
2010-07-22 22:58:34 +02:00
Alex Elder
1bf7dbfde8 Merge branch 'master' into for-linus 2010-06-04 13:22:30 -05:00
Wu Fengguang
8cbccbe761 ipconfig: document DHCP hostname and DNS record
Now it's possible to update the DNS record for $HOST_NAME with

	ip=::::$HOST_NAME::dhcp

CC: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-03 03:18:18 -07:00
Christoph Hellwig
99a4d54620 xfs: remove done roadmap item from xfs-delayed-logging-design.txt
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2010-06-03 16:22:29 +10:00
npiggin@suse.de
7bb46a6734 fs: introduce new truncate sequence
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:15:33 -04:00
Christoph Hellwig
7ea8085910 drop unused dentry argument to ->fsync
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-27 22:05:02 -04:00
Jan Blunck
866707fc27 Documentation/filesystems/Locking: update documentation on llseek() wrt BKL
The inode's i_size is not protected by the big kernel lock.  Therefore it
does not make sense to recommend taking the BKL in filesystems llseek
operations.  Instead it should use the inode's mutex or use just use
i_size_read() instead.  Add a note that this is not protecting
file->f_pos.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:56 -07:00
Linus Torvalds
63a6440326 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  squashfs: update documentation to include description of xattr layout
  squashfs: fix name reading in squashfs_xattr_get
  squashfs: constify xattr handlers
  squashfs: xattr fix sparse warnings
  squashfs: xattr_lookup sparse fix
  squashfs: add xattr support configure option
  squashfs: add new extended inode types
  squashfs: add support for xattr reading
  squashfs: add xattr id support
2010-05-26 08:57:20 -07:00
Phillip Lougher
899f453033 squashfs: update documentation to include description of xattr layout
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2010-05-26 01:12:26 +01:00
Linus Torvalds
110b93842e Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: Ensure inode allocation buffers are fully replayed
  xfs: enable background pushing of the CIL
  xfs: forced unmounts need to push the CIL
  xfs: Introduce delayed logging core code
  xfs: Delayed logging design documentation
  xfs: Improve scalability of busy extent tracking
  xfs: make the log ticket ID available outside the log infrastructure
  xfs: clean up log ticket overrun debug output
  xfs: Clean up XFS_BLI_* flag namespace
  xfs: modify buffer item reference counting
  xfs: allow log ticket allocation to take allocation flags
  xfs: Don't reuse the same transaction ID for duplicated transactions.
2010-05-25 08:17:01 -07:00
Lee Schermerhorn
971ada0f66 mempolicy: document cpuset interaction with tmpfs mpol mount option
Update Documentation/filesystems/tmpfs.txt to describe the interaction of
tmpfs mount option memory policy with tasks' cpuset mems_allowed.

Note: the mount(8) man page [in the util-linux-ng package] requires
similiar updates.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:06:57 -07:00
Alex Elder
88e88374ee Merge branch 'delayed-logging-for-2.6.35' into for-linus 2010-05-24 11:57:36 -05:00
Dave Chinner
a9a745daad xfs: Delayed logging design documentation
Document the design of the delayed logging implementation. This
includes assumptions made, dead ends followed, the reasoning behind
the structuring of the code, the layout of various structures, how
things fit together, traps and pit-falls avoided, etc. This is all
too much to document in the code itself, so do it in a separate
file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2010-05-24 10:35:39 -05:00
Linus Torvalds
7ce1418f95 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (31 commits)
  dquot: Detect partial write error to quota file in write_blk() and add printk_ratelimit for quota error messages
  ocfs2: Fix lock inversion in quotas during umount
  ocfs2: Use __dquot_transfer to avoid lock inversion
  ocfs2: Fix NULL pointer deref when writing local dquot
  ocfs2: Fix estimate of credits needed for quota allocation
  ocfs2: Fix quota locking
  ocfs2: Avoid unnecessary block mapping when refreshing quota info
  ocfs2: Do not map blocks from local quota file on each write
  quota: Refactor dquot_transfer code so that OCFS2 can pass in its references
  quota: unify quota init condition in setattr
  quota: remove sb_has_quota_active in get/set_info
  quota: unify ->set_dqblk
  quota: unify ->get_dqblk
  ext3: make barrier options consistent with ext4
  quota: Make quota stat accounting lockless.
  suppress warning: "quotatypes" defined but not used
  ext3: Fix waiting on transaction during fsync
  jbd: Provide function to check whether transaction will issue data barrier
  ufs: add ufs speciffic ->setattr call
  BKL: Remove BKL from ext2 filesystem
  ...
2010-05-21 10:50:28 -07:00
Eric Sandeen
0636c73ee7 ext3: make barrier options consistent with ext4
ext4 was updated to accept barrier/nobarrier mount options
in addition to the older barrier=0/1.  The barrier story
is complex enough, we should help people by making the options
the same at least, even if the defaults are different.

This patch allows the barrier/nobarrier mount options for ext3,
while keeping nobarrier the default.

It also unconditionally displays barrier status in show_options,
and prints a message at mount time if barriers are not enabled,
just as ext4 does.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21 19:30:41 +02:00
Serge E. Hallyn
b9d8b45ee3 sysfs-namespaces: add a high-level Documentation file
The first three paragraphs are almost verbatim taken from Eric's
commit message on the patch introducing network ns tags.  The next
two paragraphs I wrote to be a brief high level overview.  The last
section is taken from the commit message on "Implement sysfs tagged
directory support", but updated.  Hopefully correctly.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:31 -07:00
Linus Torvalds
d7dbf4ffee Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (23 commits)
  nilfs2: disallow remount of snapshot from/to a regular mount
  nilfs2: use huge_encode_dev/huge_decode_dev
  nilfs2: update comment on deactivate_super at nilfs_get_sb
  nilfs2: replace MS_VERBOSE with MS_SILENT
  nilfs2: add missing initialization of s_mode
  nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive
  nilfs2: enlarge s_volume_name member in nilfs_super_block
  nilfs2: use checkpoint number instead of timestamp to select super block
  nilfs2: add missing endian conversion on super block magic number
  nilfs2: make nilfs_sc_*_ops static
  nilfs2: add kernel doc comments to persistent object allocator functions
  nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info
  nilfs2: remove nilfs_segctor_init() in segment.c
  nilfs2: insert checkpoint number in segment summary header
  nilfs2: add a print message after loading nilfs2
  nilfs2: cleanup multi kmem_cache_{create,destroy} code
  nilfs2: move out checksum routines to segment buffer code
  nilfs2: move pointer to super root block into logs
  nilfs2: change default of 'errors' mount option to 'remount-ro' mode
  nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path()
  ...
2010-05-21 07:47:00 -07:00
Linus Torvalds
677abe49ad Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Fix typo
  GFS2: stuck in inode wait, no glocks stuck
  GFS2: Eliminate useless err variable
  GFS2: Fix writing to non-page aligned gfs2_quota structures
  GFS2: Add some useful messages
  GFS2: fix quota state reporting
  GFS2: Various gfs2_logd improvements
  GFS2: glock livelock
  GFS2: Clean up stuffed file copying
  GFS2: docs update
  GFS2: Remove space from slab cache name
2010-05-21 07:29:15 -07:00
Linus Torvalds
03e62303cf Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits)
  ocfs2: Silence a gcc warning.
  ocfs2: Don't retry xattr set in case value extension fails.
  ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break
  ocfs2: Reset xattr value size after xa_cleanup_value_truncate().
  fs/ocfs2/dlm: Use kstrdup
  fs/ocfs2/dlm: Drop memory allocation cast
  Ocfs2: Optimize punching-hole code.
  Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.
  Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.
  Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.
  ocfs2: Block signals for mkdir/link/symlink/O_CREAT.
  ocfs2: Wrap signal blocking in void functions.
  ocfs2/dlm: Increase o2dlm lockres hash size
  ocfs2: Make ocfs2_extend_trans() really extend.
  ocfs2/trivial: Code cleanup for allocation reservation.
  ocfs2: make ocfs2_adjust_resv_from_alloc simple.
  ocfs2: Make nointr a default mount option
  ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE
  o2net: log socket state changes
  ocfs2: print node # when tcp fails
  ...
2010-05-21 07:20:17 -07:00
Linus Torvalds
f39d01be4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
  vlynq: make whole Kconfig-menu dependant on architecture
  add descriptive comment for TIF_MEMDIE task flag declaration.
  EEPROM: max6875: Header file cleanup
  EEPROM: 93cx6: Header file cleanup
  EEPROM: Header file cleanup
  agp: use NULL instead of 0 when pointer is needed
  rtc-v3020: make bitfield unsigned
  PCI: make bitfield unsigned
  jbd2: use NULL instead of 0 when pointer is needed
  cciss: fix shadows sparse warning
  doc: inode uses a mutex instead of a semaphore.
  uml: i386: Avoid redefinition of NR_syscalls
  fix "seperate" typos in comments
  cocbalt_lcdfb: correct sections
  doc: Change urls for sparse
  Powerpc: wii: Fix typo in comment
  i2o: cleanup some exit paths
  Documentation/: it's -> its where appropriate
  UML: Fix compiler warning due to missing task_struct declaration
  UML: add kernel.h include to signal.c
  ...
2010-05-20 09:20:59 -07:00
Linus Torvalds
f72caf7e49 Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: (45 commits)
  Revert "nfsd4: distinguish expired from stale stateids"
  nfsd: safer initialization order in find_file()
  nfs4: minor callback code simplification, comment
  NFSD: don't report compiled-out versions as present
  nfsd4: implement reclaim_complete
  nfsd4: nfsd4_destroy_session must set callback client under the state lock
  nfsd4: keep a reference count on client while in use
  nfsd4: mark_client_expired
  nfsd4: introduce nfs4_client.cl_refcount
  nfsd4: refactor expire_client
  nfsd4: extend the client_lock to cover cl_lru
  nfsd4: use list_move in move_to_confirmed
  nfsd4: fold release_session into expire_client
  nfsd4: rename sessionid_lock to client_lock
  nfsd4: fix bare destroy_session null dereference
  nfsd4: use local variable in nfs4svc_encode_compoundres
  nfsd: further comment typos
  sunrpc: centralise most calls to svc_xprt_received
  nfsd4: fix unlikely race in session replay case
  nfsd4: fix filehandle comment
  ...
2010-05-19 17:24:54 -07:00
Linus Torvalds
6e0b7b2c39 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Clear CPU mask in affinity_hint when none is provided
  genirq: Add CPU mask affinity hint
  genirq: Remove IRQF_DISABLED from core code
  genirq: Run irq handlers with interrupts disabled
  genirq: Introduce request_any_context_irq()
  genirq: Expose irq_desc->node in proc/irq

Fixed up trivial conflicts in Documentation/feature-removal-schedule.txt
2010-05-19 17:09:40 -07:00
J. Bruce Fields
4dc6ec00f6 nfsd4: implement reclaim_complete
This is a mandatory operation.  Also, here (not in open) is where we
should be committing the reboot recovery information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 12:03:11 -04:00
Robin Holt
34441427aa revert "procfs: provide stack information for threads" and its fixup commits
Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.

Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.

Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.

Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.

Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.

This patch nearly undoes the effect of all these patches.

The reason for reverting these is it provides an unusable value in
field 28.  For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address.  This unpredictability of the stack_start value makes
it worthless.  That includes the intended use of showing how much stack
space a thread has.

Other architectures will get different values.  As an example, ia64
gets 0.  The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.

I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") .  If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured.  Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.

I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") .  I left the KSTK_ESP() change in
place as that seemed worthwhile.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-11 17:33:41 -07:00
Thadeu Lima de Souza Cascardo
ca0dbd86b1 doc: inode uses a mutex instead of a semaphore.
Replace the introduced i_sem by an i_mutex in the filesystem locking
documentation. This was introduced [1] after all occurrences were
already replaced in the same text [2]. However, the term "inode
semaphore" has not been replaced then, and it's replaced now.

[1] afddba49d1
[2] a7bc02f4f4

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-10 23:42:27 +02:00
Anand Gadiyar
a8cd4561ea fix "seperate" typos in comments
s/seperate/separate

Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-10 11:56:30 +02:00
Ryusuke Konishi
277a6a3417 nilfs2: change default of 'errors' mount option to 'remount-ro' mode
Like ext3, nilfs has 'errors' mount option to allow specifying desired
behavior on severe errors.

Currently, the default action is 'errors=continue' and has potential
to advance filesystem corruption for severe errors.

This will change the action to 'errors=remount-ro' to avoid the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-05-10 11:32:30 +09:00
Mark Fasheh
83f92318fa ocfs2: Add dir_resv_level mount option
The default behavior for directory reservations stays the same, but we add a
mount option so people can tweak the size of directory reservations
according to their workloads.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05 18:18:07 -07:00
Mark Fasheh
b07f8f24df ocfs2: change default reservation window sizes
The default reservation size of 4 (32-bit windows) is a bit too ambitious.
Scale it back to 16 bits (resv_level=2). I have been testing various sizes
on a 4-node cluster which runs a mixed workload that is heavily threaded.
With a 256MB local alloc, I get *roughly* the following levels of average file
fragmentation:

resv_level=0	70%
resv_level=1	21%
resv_level=2	23%
resv_level=3	24%
resv_level=4	60%
resv_level=5	did not test
resv_level=6	60%

resv_level=2 seemed like a good compromise between not letting windows be
too small, but not so big that heavier workloads will immediately suffer
without tuning.

This patch also change the behavior of directory reservations - they now
track file reservations.  The previous compromise of giving directory
windows only 8 bits wound up fragmenting more at some window sizes because
file allocations had smaller unused windows to poach from.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05 18:18:07 -07:00
Mark Fasheh
d02f00cc05 ocfs2: allocation reservations
This patch improves Ocfs2 allocation policy by allowing an inode to
reserve a portion of the local alloc bitmap for itself. The reserved
portion (allocation window) is advisory in that other allocation
windows might steal it if the local alloc bitmap becomes
full. Otherwise, the reservations are honored and guaranteed to be
free. When the local alloc window is moved to a different portion of
the bitmap, existing reservations are discarded.

Reservation windows are represented internally by a red-black
tree. Within that tree, each node represents the reservation window of
one inode. An LRU of active reservations is also maintained. When new
data is written, we allocate it from the inodes window. When all bits
in a window are exhausted, we allocate a new one as close to the
previous one as possible. Should we not find free space, an existing
reservation is pulled off the LRU and cannibalized.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2010-05-05 18:17:30 -07:00
Francis Galiegue
a33f32244d Documentation/: it's -> its where appropriate
Fix obvious cases of "it's" being used when "its" was meant.

Signed-off-by: Francis Galiegue <fgaliegue@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-04-23 02:09:52 +02:00
Jiri Kosina
6c9468e9eb Merge branch 'master' into for-next 2010-04-23 02:08:44 +02:00
Thomas Gleixner
7c7145f6ac Merge branch 'linus' into irq/core
Reason: Get the upstream IRQF_DISABLED related changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-13 14:12:17 +02:00
Sripathi Kodi
9208d24253 9p: documentation update
This patch adds documentation for new 9P options introduced in
2.6.34.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 10:37:36 -05:00
Linus Torvalds
9f32160372 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
  ceph: update discussion list address in MAINTAINERS
  ceph: some documentations fixes
  ceph: fix use after free on mds __unregister_request
  ceph: avoid loaded term 'OSD' in documention
  ceph: fix possible double-free of mds request reference
  ceph: fix session check on mds reply
  ceph: handle kmalloc() failure
  ceph: propagate mds session allocation failures to caller
  ceph: make write_begin wait propagate ERESTARTSYS
  ceph: fix snap rebuild condition
  ceph: avoid reopening osd connections when address hasn't changed
  ceph: rename r_sent_stamp r_stamp
  ceph: fix connection fault con_work reentrancy problem
  ceph: prevent dup stale messages to console for restarting mds
  ceph: fix pg pool decoding from incremental osdmap update
  ceph: fix mds sync() race with completing requests
  ceph: only release unused caps with mds requests
  ceph: clean up handle_cap_grant, handle_caps wrt session mutex
  ceph: fix session locking in handle_caps, ceph_check_caps
  ceph: drop unnecessary WARN_ON in caps migration
  ...
2010-03-29 14:42:25 -07:00
Cheng Renquan
8136b58dd0 ceph: some documentations fixes
New documentation should have an entry in the 00-INDEX.  Correct git
urls.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-29 09:52:11 -07:00
Andrea Gelmini
4cb947b59c GFS2: docs update
Now http://sources.redhat.com/cluster/ is redirected to
http://sources.redhat.com/cluster/wiki/

Also fixed tabs in the end.

Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2010-03-29 14:28:52 +01:00
KOSAKI Motohiro
5574169613 doc: add the documentation for mpol=local
commit 3f226aa1c (mempolicy: support mpol=local tmpfs mount option) added
new mpol=local mount option.  but it didn't add a documentation.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24 16:31:21 -07:00
Dimitri Sivanich
92d6b71ab9 genirq: Expose irq_desc->node in proc/irq
Expose irq_desc->node as /proc/irq/*/node.

This file provides device hardware locality information for apps
desiring to include hardware locality in irq mapping decisions.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-03-24 14:10:03 +01:00
Sage Weil
23ab15ad7a ceph: avoid loaded term 'OSD' in documention
'OSD' means different things to different people; avoid it here to avoid
confusion.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23 07:47:07 -07:00
Linus Torvalds
fc7f99cf36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits)
  ceph: update for write_inode API change
  ceph: reset osd after relevant messages timed out
  ceph: fix flush_dirty_caps race with caps migration
  ceph: include migrating caps in issued set
  ceph: fix osdmap decoding when pools include (removed) snaps
  ceph: return EBADF if waiting for caps on closed file
  ceph: set osd request message front length correctly
  ceph: reset front len on return to msgpool; BUG on mismatched front iov
  ceph: fix snaptrace decoding on cap migration between mds
  ceph: use single osd op reply msg
  ceph: reset bits on connection close
  ceph: remove bogus mds forward warning
  ceph: remove fragile __map_osds optimization
  ceph: fix connection fault STANDBY check
  ceph: invalidate_authorizer without con->mutex held
  ceph: don't clobber write return value when using O_SYNC
  ceph: fix client_request_forward decoding
  ceph: drop messages on unregistered mds sessions; cleanup
  ceph: fix comments, locking in destroy_inode
  ceph: move dereference after NULL test
  ...

Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt
2010-03-19 09:43:06 -07:00
Rob Landley
32e688b8c1 Documentation/filesystems/proc.txt typo fix.
Typo fix for a filename in procfs documentation.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-03-15 15:21:31 +01:00
Linus Torvalds
c32da02342 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
  doc: fix typo in comment explaining rb_tree usage
  Remove fs/ntfs/ChangeLog
  doc: fix console doc typo
  doc: cpuset: Update the cpuset flag file
  Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
  Remove drivers/parport/ChangeLog
  Remove drivers/char/ChangeLog
  doc: typo - Table 1-2 should refer to "status", not "statm"
  tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
  No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
  devres/irq: Fix devm_irq_match comment
  Remove reference to kthread_create_on_cpu
  tree-wide: Assorted spelling fixes
  tree-wide: fix 'lenght' typo in comments and code
  drm/kms: fix spelling in error message
  doc: capitalization and other minor fixes in pnp doc
  devres: typo fix s/dev/devm/
  Remove redundant trailing semicolons from macros
  fix typo "definetly" -> "definitely" in comment
  tree-wide: s/widht/width/g typo in comments
  ...

Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-12 16:04:50 -08:00
Randy Dunlap
1e0051ae48 Documentation/fs/: split txt and source files
Make dnotify_test.c source file and add it to Makefile so that
bitrot can be prevented.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:35 -08:00
Jiri Kosina
318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
Linus Torvalds
66b89159c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs
* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
  [LogFS] Change magic number
  [LogFS] Remove h_version field
  [LogFS] Check feature flags
  [LogFS] Only write journal if dirty
  [LogFS] Fix bdev erases
  [LogFS] Silence gcc
  [LogFS] Prevent 64bit divisions in hash_index
  [LogFS] Plug memory leak on error paths
  [LogFS] Add MAINTAINERS entry
  [LogFS] add new flash file system

Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
fs/logfs/inode.c introduced by write_inode() being changed to use
writeback_control' by commit a9185b41a4
("pass writeback_control to ->write_inode")
2010-03-06 13:18:03 -08:00
Linus Torvalds
05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
Mel Gorman
a1b57ac061 mm: document /proc/pagetypeinfo
Add documentation for /proc/pagetypeinfo.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:26 -08:00
KAMEZAWA Hiroyuki
b084d4353f mm: count swap usage
A frequent questions from users about memory management is what numbers of
swap ents are user for processes.  And this information will give some
hints to oom-killer.

Besides we can count the number of swapents per a process by scanning
/proc/<pid>/smaps, this is very slow and not good for usual process
information handler which works like 'ps' or 'top'.  (ps or top is now
enough slow..)

This patch adds a counter of swapents to mm_counter and update is at each
swap events.  Information is exported via /proc/<pid>/status file as

[kamezawa@bluextal memory]$ cat /proc/self/status
Name:   cat
State:  R (running)
Tgid:   2910
Pid:    2910
PPid:   2823
TracerPid:      0
Uid:    500     500     500     500
Gid:    500     500     500     500
FDSize: 256
Groups: 500
VmPeak:    82696 kB
VmSize:    82696 kB
VmLck:         0 kB
VmHWM:       432 kB
VmRSS:       432 kB
VmData:      172 kB
VmStk:        84 kB
VmExe:        48 kB
VmLib:      1568 kB
VmPTE:        40 kB
VmSwap:        0 kB <=============== this.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:24 -08:00
KAMEZAWA Hiroyuki
34e55232e5 mm: avoid false sharing of mm_counter
Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.

This patch adds per-thread cache for mm_counter.  RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.

Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.

 rough estimation with small benchmark on parallel thread (2threads) shows
 [before]
     4.5 cache-miss/faults
 [after]
     4.0 cache-miss/faults
 Anyway, the most contended object is mmap_sem if the number of threads grows.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:24 -08:00
Linus Torvalds
e213e26ab3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
  quota: stop using QUOTA_OK / NO_QUOTA
  dquot: cleanup dquot initialize routine
  dquot: move dquot initialization responsibility into the filesystem
  dquot: cleanup dquot drop routine
  dquot: move dquot drop responsibility into the filesystem
  dquot: cleanup dquot transfer routine
  dquot: move dquot transfer responsibility into the filesystem
  dquot: cleanup inode allocation / freeing routines
  dquot: cleanup space allocation / freeing routines
  ext3: add writepage sanity checks
  ext3: Truncate allocated blocks if direct IO write fails to update i_size
  quota: Properly invalidate caches even for filesystems with blocksize < pagesize
  quota: generalize quota transfer interface
  quota: sb_quota state flags cleanup
  jbd: Delay discarding buffers in journal_unmap_buffer
  ext3: quota_write cross block boundary behaviour
  quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
  quota: split out compat_sys_quotactl support from quota.c
  quota: split out netlink notification support from quota.c
  quota: remove invalid optimization from quota_sync_all
  ...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c
2010-03-05 13:20:53 -08:00
Christoph Hellwig
871a293155 dquot: cleanup dquot initialize routine
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig
9f75475802 dquot: cleanup dquot drop routine
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:30 +01:00
Christoph Hellwig
b43fa8284d dquot: cleanup dquot transfer routine
Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:29 +01:00
Christoph Hellwig
63936ddaa1 dquot: cleanup inode allocation / freeing routines
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.

Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Christoph Hellwig
5dd4056db8 dquot: cleanup space allocation / freeing routines
Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05 00:20:28 +01:00
Linus Torvalds
0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Al Viro
2f99cc6e46 add several pieces to shared subtree documentation
* document locking
* add the missing part of data structure invariants (relationship
between mnt_share and mnt_slave lists in case of a peer group
among slaves).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-03 13:00:23 -05:00
Linus Torvalds
feaf77d51a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: add reader's lock for cno in nilfs_ioctl_sync
  nilfs2: delete unnecessary condition in load_segment_summary
  nilfs2: move iterator to write log into segment buffer
  nilfs2: get rid of s_dirt flag use
  nilfs2: get rid of nilfs_segctor_req struct
  nilfs2: delete unnecessary condition in nilfs_dat_translate
  nilfs2: fix potential hang in nilfs_error on errors=remount-ro
  nilfs2: use mnt_want_write in ioctls where write access is needed
  nilfs2: issue discard request after cleaning segments
2010-03-03 08:53:49 -08:00
Mulyadi Santosa
cb2992a60b doc: typo - Table 1-2 should refer to "status", not "statm"
Fixes a typo in proc.txt documentation. Table 1-2 should refer to
"status", not "statm"

Signed-off-by: Mulyadi Santosa <mulyadi.santosa@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-18 00:43:50 +01:00
Jiro SEKIBA
e902ec9906 nilfs2: issue discard request after cleaning segments
This adds a function to send discard requests for given array of
segment numbers, and calls the function when garbage collection
succeeded.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-02-13 12:26:02 +09:00
J. Bruce Fields
73834d6f90 nfsd: 4.1 has an rfc number
No need to refer to an internet draft; there's an RFC now.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-01-20 17:17:04 -05:00
Paul E. McKenney
4c54005ca4 rcu: 1Q2010 update for RCU documentation
Add expedited functions.  Review documentation and update
obsolete verbiage.  Also fix the advice for the RCU CPU-stall
kernel configuration parameter, and document RCU CPU-stall
warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12635142581866-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-16 10:25:22 +01:00
KOSAKI Motohiro
1306d603fc proc: partially revert "procfs: provide stack information for threads"
Commit d899bf7b (procfs: provide stack information for threads) introduced
to show stack information in /proc/{pid}/status.  But it cause large
performance regression.  Unfortunately /proc/{pid}/status is used ps
command too and ps is one of most important component.  Because both to
take mmap_sem and page table walk are heavily operation.

If many process run, the ps performance is,

[before d899bf7b]

% perf stat ps >/dev/null

 Performance counter stats for 'ps':

     4090.435806  task-clock-msecs         #      0.032 CPUs
             229  context-switches         #      0.000 M/sec
               0  CPU-migrations           #      0.000 M/sec
             234  page-faults              #      0.000 M/sec
      8587565207  cycles                   #   2099.425 M/sec
      9866662403  instructions             #      1.149 IPC
      3789415411  cache-references         #    926.409 M/sec
        30419509  cache-misses             #      7.437 M/sec

   128.859521955  seconds time elapsed

[after d899bf7b]

% perf stat  ps  > /dev/null

 Performance counter stats for 'ps':

     4305.081146  task-clock-msecs         #      0.028 CPUs
             480  context-switches         #      0.000 M/sec
               2  CPU-migrations           #      0.000 M/sec
             237  page-faults              #      0.000 M/sec
      9021211334  cycles                   #   2095.480 M/sec
     10605887536  instructions             #      1.176 IPC
      3612650999  cache-references         #    839.160 M/sec
        23917502  cache-misses             #      5.556 M/sec

   152.277819582  seconds time elapsed

Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
provide almost same information. we can use it.

Commit d899bf7b introduced two features:

 1) Add the annotattion of [thread stack: xxxx] mark to
    /proc/{pid}/task/{tid}/maps.
 2) Add StackUsage field to /proc/{pid}/status.

I only revert (2), because I haven't seen (1) cause regression.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11 09:34:06 -08:00
Linus Torvalds
a87da40875 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: update mailing list address
  nilfs2: Storage class should be before const qualifier
  nilfs2: trivial coding style fix
2010-01-04 12:28:26 -08:00
Ryusuke Konishi
6aff43f817 nilfs2: update mailing list address
This replaces the list address for nilfs discussion to linux-nilfs at
vger.kernel.org from users at nilfs.org.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-01-02 21:47:04 +09:00
Fang Wenqi
6d3b82f2d3 ext4: Update documentation to correct the inode_readahead_blks option name
Per commit 240799cd, the option name for readahead should be
inode_readahead_blks, not inode_readahead.

Signed-off-by: Fang Wenqi <antonf@turbolinux.com.cn>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-24 17:51:42 -05:00
Phil Carmody
099c2f21d8 Driver core: driver_attribute parameters can often be const*
Many struct driver_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore make
the promise not to, which will let those descriptors be put in
a ro section.

Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Phil Carmody
26579ab70a Driver core: device_attribute parameters can often be const*
Most device_attributes are const, and are begging to be
put in a ro section. However, the create and remove
file interfaces were failing to propagate the const promise
which the only functions they call offer.

Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23 11:23:43 -08:00
Linus Torvalds
37c24b37fb Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits)
  nfsd: remove pointless paths in file headers
  nfsd: move most of nfsfh.h to fs/nfsd
  nfsd: remove unused field rq_reffh
  nfsd: enable V4ROOT exports
  nfsd: make V4ROOT exports read-only
  nfsd: restrict filehandles accepted in V4ROOT case
  nfsd: allow exports of symlinks
  nfsd: filter readdir results in V4ROOT case
  nfsd: filter lookup results in V4ROOT case
  nfsd4: don't continue "under" mounts in V4ROOT case
  nfsd: introduce export flag for v4 pseudoroot
  nfsd: let "insecure" flag vary by pseudoflavor
  nfsd: new interface to advertise export features
  nfsd: Move private headers to source directory
  vfs: nfsctl.c un-used nfsd #includes
  lockd: Remove un-used nfsd headers #includes
  s390: remove un-used nfsd #includes
  sparc: remove un-used nfsd #includes
  parsic: remove un-used nfsd #includes
  compat.c: Remove dependence on nfsd private headers
  ...
2009-12-16 10:43:34 -08:00
Alexey Dobriyan
6be4b78993 seq_file: use proc_create() in documentation
Using create_proc_entry() + ->proc_fops assignment is racy because
->proc_fops will be NULL for some time, use proc_create() to avoid race.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:07 -08:00
john stultz
4614a696bd procfs: allow threads to rename siblings via /proc/pid/tasks/tid/comm
Setting a thread's comm to be something unique is a very useful ability
and is helpful for debugging complicated threaded applications.  However
currently the only way to set a thread name is for the thread to name
itself via the PR_SET_NAME prctl.

However, there may be situations where it would be advantageous for a
thread dispatcher to be naming the threads its managing, rather then
having the threads self-describe themselves.  This sort of behavior is
available on other systems via the pthread_setname_np() interface.

This patch exports a task's comm via proc/pid/comm and
proc/pid/task/tid/comm interfaces, and allows thread siblings to write to
these values.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Mike Fulton <fultonm@ca.ibm.com>
Cc: Sean Foley <Sean_Foley@ca.ibm.com>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:24 -08:00
Linus Torvalds
3126c136bc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (21 commits)
  ext3: PTR_ERR return of wrong pointer in setup_new_group_blocks()
  ext3: Fix data / filesystem corruption when write fails to copy data
  ext4: Support for 64-bit quota format
  ext3: Support for vfsv1 quota format
  quota: Implement quota format with 64-bit space and inode limits
  quota: Move definition of QFMT_OCFS2 to linux/quota.h
  ext2: fix comment in ext2_find_entry about return values
  ext3: Unify log messages in ext3
  ext2: clear uptodate flag on super block I/O error
  ext2: Unify log messages in ext2
  ext3: make "norecovery" an alias for "noload"
  ext3: Don't update the superblock in ext3_statfs()
  ext3: journal all modifications in ext3_xattr_set_handle
  ext2: Explicitly assign values to on-disk enum of filetypes
  quota: Fix WARN_ON in lookup_one_len
  const: struct quota_format_ops
  ubifs: remove manual O_SYNC handling
  afs: remove manual O_SYNC handling
  kill wait_on_page_writeback_range
  vfs: Implement proper O_SYNC semantics
  ...
2009-12-11 15:31:13 -08:00
Linus Torvalds
4e2ccdb040 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (49 commits)
  nilfs2: separate wait function from nilfs_segctor_write
  nilfs2: add iterator for segment buffers
  nilfs2: hide nilfs_write_info struct in segment buffer code
  nilfs2: relocate io status variables to segment buffer
  nilfs2: do not return io error for bio allocation failure
  nilfs2: use list_splice_tail or list_splice_tail_init
  nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty
  nilfs2: delete mark_inode_dirty in nilfs_delete_entry
  nilfs2: delete mark_inode_dirty in nilfs_commit_chunk
  nilfs2: change return type of nilfs_commit_chunk
  nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink
  nilfs2: delete redundant mark_inode_dirty
  nilfs2: expand inode_inc_link_count and inode_dec_link_count
  nilfs2: delete mark_inode_dirty from nilfs_set_link
  nilfs2: delete mark_inode_dirty in nilfs_new_inode
  nilfs2: add norecovery mount option
  nilfs2: add helper to get if volume is in a valid state
  nilfs2: move recovery completion into load_nilfs function
  nilfs2: apply readahead for recovery on mount
  nilfs2: clean up get/put function of a segment usage
  ...
2009-12-11 11:49:18 -08:00
Linus Torvalds
4515c3069d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits)
  ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)
  ext4: Do not override ext2 or ext3 if built they are built as modules
  jbd2: Export jbd2_log_start_commit to fix ext4 build
  ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
  ext4: Wait for proper transaction commit on fsync
  ext4: fix incorrect block reservation on quota transfer.
  ext4: quota macros cleanup
  ext4: ext4_get_reserved_space() must return bytes instead of blocks
  ext4: remove blocks from inode prealloc list on failure
  ext4: wait for log to commit when umounting
  ext4: Avoid data / filesystem corruption when write fails to copy data
  ext4: Use ext4 file system driver for ext2/ext3 file system mounts
  ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks()
  jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer()
  ext4: remove unused parameter wbc from __ext4_journalled_writepage()
  ext4: remove encountered_congestion trace
  ext4: move_extent_per_page() cleanup
  ext4: initialize moved_len before calling ext4_move_extents()
  ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
  ext4: use ext4_data_block_valid() in ext4_free_blocks()
  ...
2009-12-10 09:33:29 -08:00
Eric Sandeen
dee1d3b627 ext3: make "norecovery" an alias for "noload"
Users on the list recently complained about differences across
filesystems w.r.t. how to mount without a journal replay.

In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext3.

Also show this status in /proc/mounts

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:52 +01:00
Christoph Hellwig
94004ed726 kill wait_on_page_writeback_range
All callers really want the more logical filemap_fdatawait_range interface,
so convert them to use it and merge wait_on_page_writeback_range into
filemap_fdatawait_range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-10 15:02:50 +01:00
Thadeu Lima de Souza Cascardo
9f249162fb trivial: some small fixes in exofs documentation
Add exofs.txt to filesystems Documentation index and fix some typos,
identation and grammar.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10 09:59:16 +02:00
Luis Garces-Erice
e3cc2226e9 Doc: better explanation of procs_running
the description in Documentation/filesystems/proc.txt of the
procs_running entry in /proc/stat is confusing (according to that
description, it looks as if procs_running could only be a number
between 0 and the number of CPUs).

Changed it to a more accurate description in the patch attached.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-09 18:59:52 -08:00
Linus Torvalds
897e81bea1 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
  sched, cputime: Introduce thread_group_times()
  sched, cputime: Cleanups related to task_times()
  Revert "sched, x86: Optimize branch hint in __switch_to()"
  sched: Fix isolcpus boot option
  sched: Revert 498657a478
  sched, time: Define nsecs_to_jiffies()
  sched: Remove task_{u,s,g}time()
  sched: Introduce task_times() to replace task_{u,s}time() pair
  sched: Limit the number of scheduler debug messages
  sched.c: Call debug_show_all_locks() when dumping all tasks
  sched, x86: Optimize branch hint in __switch_to()
  sched: Optimize branch hint in context_switch()
  sched: Optimize branch hint in pick_next_task_fair()
  sched_feat_write(): Update ppos instead of file->f_pos
  sched: Sched_rt_periodic_timer vs cpu hotplug
  sched, kvm: Fix race condition involving sched_in_preempt_notifers
  sched: More generic WAKE_AFFINE vs select_idle_sibling()
  sched: Cleanup select_task_rq_fair()
  sched: Fix granularity of task_u/stime()
  sched: Fix/add missing update_rq_clock() calls
  ...
2009-12-05 15:30:49 -08:00
Linus Torvalds
6e80133f7f Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...
2009-11-30 13:33:48 -08:00
Ingo Molnar
16bc67edeb Merge branch 'sched/urgent' into sched/core
Merge reason: Pick up fixes that did not make it into .32.0

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:50:42 +01:00
J. Bruce Fields
9b8b317d58 Merge commit 'v2.6.32-rc8' into HEAD 2009-11-23 12:34:58 -05:00
Joern Engel
5db53f3e80 [LogFS] add new flash file system
This is a new flash file system. See
Documentation/filesystems/logfs.txt

Signed-off-by: Joern Engel <joern@logfs.org>
2009-11-20 20:13:39 +01:00
Linus Torvalds
931ed94430 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Trivial cleanup of jbd compatibility layer removal
  ocfs2: Refresh documentation
  ocfs2: return f_fsid info in ocfs2_statfs()
  ocfs2: duplicate inline data properly during reflink.
  ocfs2: Move ocfs2_complete_reflink to the right place.
  ocfs2: Return -EINVAL when a device is not ocfs2.
2009-11-19 20:29:05 -08:00
Ryusuke Konishi
0234576d04 nilfs2: add norecovery mount option
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.

This option will be helpful when user wants to prevent any write
access to the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-11-20 10:05:52 +09:00
Jiro SEKIBA
91f1953bf3 nilfs2: Using nobarrier option instead of barrier=off
Since most of fs using nofoobar style option,
modified barrier=off option as nobarrier.

Signed-off-by: Jiro SEKIBA <jir@unicus.jp>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:47 +09:00
Eric Sandeen
e3bb52ae2b ext4: make "norecovery" an alias for "noload"
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.

In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.

Also show this status in /proc/mounts

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-19 14:28:50 -05:00
Eric Sandeen
5328e63531 ext4: make trim/discard optional (and off by default)
It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues.  Make
this mount-time optional, and default it to off until we know
that things are working out OK.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-19 14:25:42 -05:00
David Howells
fee096deb4 CacheFiles: Catch an overly long wait for an old active object
Catch an overly long wait for an old, dying active object when we want to
replace it with a new one.  The probability is that all the slow-work threads
are hogged, and the delete can't get a look in.

What we do instead is:

 (1) if there's nothing in the slow work queue, we sleep until either the dying
     object has finished dying or there is something in the slow work queue
     behind which we can queue our object.

 (2) if there is something in the slow work queue, we return ETIMEDOUT to
     fscache_lookup_object(), which then puts us back on the slow work queue,
     presumably behind the deletion that we're blocked by.  We are then
     deferred for a while until we work our way back through the queue -
     without blocking a slow-work thread unnecessarily.

A backtrace similar to the following may appear in the log without this patch:

	INFO: task kslowd004:5711 blocked for more than 120 seconds.
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	kslowd004     D 0000000000000000     0  5711      2 0x00000080
	 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000
	 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8
	Call Trace:
	 [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles]
	 [<ffffffff81353153>] __wait_on_bit+0x43/0x76
	 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270
	 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74
	 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles]
	 [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e
	 [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles]
	 [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles]
	 [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles]
	 [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache]
	 [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache]
	 [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20
	1 lock held by kslowd004/5711:
	 #0:  (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles]

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:12:05 +00:00
David Howells
60d543ca72 FS-Cache: Start processing an object's operations on that object's death
Start processing an object's operations when that object moves into the DYING
state as the object cannot be destroyed until all its outstanding operations
have completed.

Furthermore, make sure that read and allocation operations handle being woken
up on a dead object.  Such events are recorded in the Allocs.abt and
Retrvls.abt statistics as viewable through /proc/fs/fscache/stats.

The code for waiting for object activation for the read and allocation
operations is also extracted into its own function as it is much the same in
all cases, differing only in the stats incremented.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:45 +00:00
David Howells
201a15428b FS-Cache: Handle pages pending storage that get evicted under OOM conditions
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs]
	 [<ffffffff810885d3>] try_to_release_page+0x32/0x3b
	 [<ffffffff81093203>] shrink_page_list+0x316/0x4ac
	 [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c
	 [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb
	 [<ffffffff81093aa2>] shrink_list+0x8d/0x8f
	 [<ffffffff81093d1c>] shrink_zone+0x278/0x33c
	 [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba
	 [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392
	 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212
	 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf
	 [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa
	 [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb
	 [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c
	 [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29
	 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385
	 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae
	 [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae
	 [<ffffffff810b2e82>] do_sync_write+0xe3/0x120
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8
	 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89
	 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache]
	 [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache]
	 [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1
	 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308
	 [<ffffffff8104be91>] kthread+0x7a/0x82
	 [<ffffffff8100beda>] child_rip+0xa/0x20
	 [<ffffffff8100b87c>] ? restore_args+0x0/0x30
	 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227
	 [<ffffffff8104be17>] ? kthread+0x0/0x82
	 [<ffffffff8100bed0>] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:35 +00:00
David Howells
e3d4d28b1c FS-Cache: Handle read request vs lookup, creation or other cache failure
FS-Cache doesn't correctly handle the netfs requesting a read from the cache
on an object that failed or was withdrawn by the cache.  A trace similar to
the following might be seen:

	CacheFiles: Lookup failed error -105
	[exe   ] unexpected submission OP165afe [OBJ6cac OBJECT_LC_DYING]
	[exe   ] objstate=OBJECT_LC_DYING [OBJECT_LC_DYING]
	[exe   ] objflags=0
	[exe   ] objevent=9 [fffffffffffffffb]
	[exe   ] ops=0 inp=0 exc=0
	Pid: 6970, comm: exe Not tainted 2.6.32-rc6-cachefs #50
	Call Trace:
	 [<ffffffffa0076477>] fscache_submit_op+0x3ff/0x45a [fscache]
	 [<ffffffffa0077997>] __fscache_read_or_alloc_pages+0x187/0x3c4 [fscache]
	 [<ffffffffa00b6480>] ? nfs_readpage_from_fscache_complete+0x0/0x66 [nfs]
	 [<ffffffffa00b6388>] __nfs_readpages_from_fscache+0x7e/0x176 [nfs]
	 [<ffffffff8108e483>] ? __alloc_pages_nodemask+0x11c/0x5cf
	 [<ffffffffa009d796>] nfs_readpages+0x114/0x1d7 [nfs]
	 [<ffffffff81090314>] __do_page_cache_readahead+0x15f/0x1ec
	 [<ffffffff81090228>] ? __do_page_cache_readahead+0x73/0x1ec
	 [<ffffffff810903bd>] ra_submit+0x1c/0x20
	 [<ffffffff810906bb>] ondemand_readahead+0x227/0x23a
	 [<ffffffff81090762>] page_cache_sync_readahead+0x17/0x19
	 [<ffffffff8108a99e>] generic_file_aio_read+0x236/0x5a0
	 [<ffffffffa00937bd>] nfs_file_read+0xe4/0xf3 [nfs]
	 [<ffffffff810b2fa2>] do_sync_read+0xe3/0x120
	 [<ffffffff81354cc3>] ? _spin_unlock_irq+0x2b/0x31
	 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34
	 [<ffffffff811848e5>] ? selinux_file_permission+0x5d/0x10f
	 [<ffffffff81352bdb>] ? thread_return+0x3e/0x101
	 [<ffffffff8117d7b0>] ? security_file_permission+0x11/0x13
	 [<ffffffff810b3b06>] vfs_read+0xaa/0x16f
	 [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130
	 [<ffffffff810b3c84>] sys_read+0x45/0x6c
	 [<ffffffff8100ae2b>] system_call_fastpath+0x16/0x1b

The object state might also be OBJECT_DYING or OBJECT_WITHDRAWING.

This should be handled by simply rejecting the new operation with ENOBUFS.
There's no need to log an error for it.  Events of this type now appear in the
stats file under Ops:rej.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:32 +00:00
David Howells
1bccf513ac FS-Cache: Fix lock misorder in fscache_write_op()
FS-Cache has two structs internally for keeping track of the internal state of
a cached file: the fscache_cookie struct, which represents the netfs's state,
and fscache_object struct, which represents the cache's state.  Each has a
pointer that points to the other (when both are in existence), and each has a
spinlock for pointer maintenance.

Since netfs operations approach these structures from the cookie side, they get
the cookie lock first, then the object lock.  Cache operations, on the other
hand, approach from the object side, and get the object lock first.  It is not
then permitted for a cache operation to get the cookie lock whilst it is
holding the object lock lest deadlock occur; instead, it must do one of two
things:

 (1) increment the cookie usage counter, drop the object lock and then get both
     locks in order, or

 (2) simply hold the object lock as certain parts of the cookie may not be
     altered whilst the object lock is held.

It is also not permitted to follow either pointer without holding the lock at
the end you start with.  To break the pointers between the cookie and the
object, both locks must be held.

fscache_write_op(), however, violates the locking rules: It attempts to get the
cookie lock without (a) checking that the cookie pointer is a valid pointer,
and (b) holding the object lock to protect the cookie pointer whilst it follows
it.  This is so that it can access the pending page store tree without
interference from __fscache_write_page().

This is fixed by splitting the cookie lock, such that the page store tracking
tree is protected by its own lock, and checking that the cookie pointer is
non-NULL before we attempt to follow it whilst holding the object lock.

The new lock is subordinate to both the cookie lock and the object lock, and so
should be taken after those.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:25 +00:00
David Howells
5753c44188 FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
Permit the operations to retrieve data from the cache or to allocate space in
the cache for future writes to be interrupted whilst they're waiting for
permission for the operation to proceed.  Typically this wait occurs whilst the
cache object is being looked up on disk in the background.

If an interruption occurs, and the operation has not yet been given the
go-ahead to run, the operation is dequeued and cancelled, and control returns
to the read operation of the netfs routine with none of the requested pages
having been read or in any way marked as known by the cache.

This means that the initial wait is done interruptibly rather than
uninterruptibly.

In addition, extra stats values are made available to show the number of ops
cancelled and the number of cache space allocations interrupted.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:19 +00:00
David Howells
52bd75fdb1 FS-Cache: Add counters for entry/exit to/from cache operation functions
Count entries to and exits from cache operation table functions.  Maintain
these as a single counter that's added to or removed from as appropriate.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:08 +00:00
David Howells
4fbf4291aa FS-Cache: Allow the current state of all objects to be dumped
Allow the current state of all fscache objects to be dumped by doing:

	cat /proc/fs/fscache/objects

By default, all objects and all fields will be shown.  This can be restricted
by adding a suitable key to one of the caller's keyrings (such as the session
keyring):

	keyctl add user fscache:objlist "<restrictions>" @s

The <restrictions> are:

	K	Show hexdump of object key (don't show if not given)
	A	Show hexdump of object aux data (don't show if not given)

And paired restrictions:

	C	Show objects that have a cookie
	c	Show objects that don't have a cookie
	B	Show objects that are busy
	b	Show objects that aren't busy
	W	Show objects that have pending writes
	w	Show objects that don't have pending writes
	R	Show objects that have outstanding reads
	r	Show objects that don't have outstanding reads
	S	Show objects that have slow work queued
	s	Show objects that don't have slow work queued

If neither side of a restriction pair is given, then both are implied.  For
example:

	keyctl add user fscache:objlist KB @s

shows objects that are busy, and lists their object keys, but does not dump
their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
not implied.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 18:11:04 +00:00
Sunil Mushran
7ab8f5244d ocfs2: Refresh documentation
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-11-13 15:45:01 -08:00
J. Bruce Fields
ea4878a24d nfs: move more to Documentation/filesystems/nfs
Oops: I missed two files in the first commit that created this
directory.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-11-06 14:01:02 -05:00
Linus Torvalds
d4da6c9ccf Revert "ext4: Remove journal_checksum mount option and enable it by default"
This reverts commit d0646f7b63, as
requested by Eric Sandeen.

It can basically cause an ext4 filesystem to miss recovery (and thus get
mounted with errors) if the journal checksum does not match.

Quoth Eric:

   "My hand-wavy hunch about what is happening is that we're finding a
    bad checksum on the last partially-written transaction, which is
    not surprising, but if we have a wrapped log and we're doing the
    initial scan for head/tail, and we abort scanning on that bad
    checksum, then we are essentially running an unrecovered filesystem.

    But that's hand-wavy and I need to go look at the code.

    We lived without journal checksums on by default until now, and at
    this point they're doing more harm than good, so we should revert
    the default-changing commit until we can fix it and do some good
    power-fail testing with the fixes in place."

See

	http://bugzilla.kernel.org/show_bug.cgi?id=14354

for all the gory details.

Requested-by: Eric Sandeen <sandeen@redhat.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Alexey Fisher <bug-track@fisher-privat.net>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mathias Burén <mathias.buren@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-02 10:15:27 -08:00
J. Bruce Fields
dc7a08166f nfs: new subdir Documentation/filesystems/nfs
We're adding enough nfs documentation that it may as well have its own
subdirectory.

Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-10-27 19:34:04 -04:00
J. Bruce Fields
e343eb0d60 Merge commit 'v2.6.32-rc5' into for-2.6.33 2009-10-27 18:45:17 -04:00
Ryota Ozaki
ce0e7b28fb sched, cpuacct: Fix niced guest time accounting
CPU time of a guest is always accounted in 'user' time
without concern for the nice value of its counterpart
process although the guest is scheduled under the nice
value.

This patch fixes the defect and accounts cpu time of
a niced guest in 'nice' time as same as a niced process.

And also the patch adds 'guest_nice' to cpuacct. The
value provides niced guest cpu time which is like 'nice'
to 'user'.

The original discussions can be found here:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html
  http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html

Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1256314810-7897-1-git-send-email-ozaki.ryota@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-25 17:31:30 +01:00
Jan Kara
6dbce52182 ext3: Update documentation about ext3 quota mount options
Signed-off-by: Jan Kara <jack@suse.cz>
2009-10-13 00:06:43 +02:00
Sage Weil
7ad920b504 ceph: documentation
Mount options, syntax.

Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-06 11:31:05 -07:00
Linus Torvalds
9f44fdc518 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Fix time encoding with extra epoch bits
  ext4: Add a stub for mpage_da_data in the trace header
  jbd2: Use tracepoints for history file
  ext4: Use tracepoints for mb_history trace file
  ext4, jbd2: Drop unneeded printks at mount and unmount time
  ext4: Handle nested ext4_journal_start/stop calls without a journal
  ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
  ext4: Avoid updating the inode table bh twice in no journal mode
  ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
  ext4: async direct IO for holes and fallocate support
  ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
  ext4: Split uninitialized extents for direct I/O
  ext4: release reserved quota when block reservation for delalloc retry
  ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
  ext4: Fix hueristic which avoids group preallocation for closed files
  ext4: Use ext4_msg() for ext4_da_writepage() errors
  ext4: Update documentation about quota mount options
2009-09-30 09:32:30 -07:00
Linus Torvalds
4c8f1cb266 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: Check s_dirt in fat_sync_fs()
  vfat: change the default from shortname=lower to shortname=mixed
  fat/nls: Fix handling of utf8 invalid char
2009-09-30 09:31:14 -07:00
Theodore Ts'o
296c355cd6 ext4: Use tracepoints for mb_history trace file
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30 00:32:42 -04:00
Andy Adamson
ddc04fd4d5 nfsd41: use sv_max_mesg for forechannel max sizes
ca_maxresponsesize and ca_maxrequest size include the RPC header.

sv_max_mesg is sv_max_payolad plus a page for overhead and is used in
svc_init_buffer to allocate server buffer space for both the request and reply.
Note that this means we can service an RPC compound that requires
ca_maxrequestsize (MAXWRITE) or ca_max_responsesize (MAXREAD) but that we do
not support an RPC compound that requires both ca_maxrequestsize and
ca_maxresponsesize.

Signed-off-by: Andy Adamson <andros@netapp.com>
[bfields@citi.umich.edu: more documentation updates]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:40:15 -04:00
J. Bruce Fields
03d6a74b5f nfsd: fix Documentation typo
Caught by Benny, thanks!

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-28 12:07:52 -04:00
Jan Kara
8365388827 ext4: Update documentation about quota mount options
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29 15:59:34 -04:00
Linus Torvalds
db16826367 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
  HWPOISON: Enable error_remove_page on btrfs
  HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
  HWPOISON: Add madvise() based injector for hardware poisoned pages v4
  HWPOISON: Enable error_remove_page for NFS
  HWPOISON: Enable .remove_error_page for migration aware file systems
  HWPOISON: The high level memory error handler in the VM v7
  HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
  HWPOISON: shmem: call set_page_dirty() with locked page
  HWPOISON: Define a new error_remove_page address space op for async truncation
  HWPOISON: Add invalidate_inode_page
  HWPOISON: Refactor truncate to allow direct truncating of page v2
  HWPOISON: check and isolate corrupted free pages v2
  HWPOISON: Handle hardware poisoned pages in try_to_unmap
  HWPOISON: Use bitmask/action code for try_to_unmap behaviour
  HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
  HWPOISON: Add poison check to page fault handling
  HWPOISON: Add basic support for poisoned pages in fault handler v3
  HWPOISON: Add new SIGBUS error codes for hardware poison signals
  HWPOISON: Add support for poison swap entries v2
  HWPOISON: Export some rmap vma locking to outside world
  ...
2009-09-24 07:53:22 -07:00
Peng Tao
16c01b20ae doc/filesystems: more mount cleanups
Documentation/filesystems/sharedsubtree.txt needs updating because the
mount command in util-linux package is well aware of shared subtree
features now.  The patch also fixes two typos in sharedsubtree.txt.

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:57 -07:00
Randy Dunlap
0288b95b43 doc/filesystems: remove smount program
mount(8) handles shared subtrees just fine, so remove the smount program
from Documentation/filesystems/sharedsubtree.txt.

Fix annoying "Lets" -> "Let's".
Insert space between '#' prompt and "mount" command.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:20:57 -07:00
Abhishek Kulkarni
f4edeeb393 9p: Update documentation to add fscache related bits
Update the documentation to describe FS-Cache related
caching parameters. This patch also updates the pointers
to 9p-related papers and adds pointer to the Wiki.

Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-09-23 13:03:46 -05:00
Bartlomiej Zolnierkiewicz
1b83df308f ncpfs: remove dead URL from documentation
Noticed-by: Joe Perches <joe@perches.com>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:42 -07:00
Stefani Seibold
d899bf7b55 procfs: provide stack information for threads
A patch to give a better overview of the userland application stack usage,
especially for embedded linux.

Currently you are only able to dump the main process/thread stack usage
which is showed in /proc/pid/status by the "VmStk" Value.  But you get no
information about the consumed stack memory of the the threads.

There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which marks
the vm mapping where the thread stack pointer reside with "[thread stack
xxxxxxxx]".  xxxxxxxx is the maximum size of stack.  This is a value
information, because libpthread doesn't set the start of the stack to the
top of the mapped area, depending of the pthread usage.

A sample output of /proc/<pid>/task/<tid>/maps looks like:

08048000-08049000 r-xp 00000000 03:00 8312       /opt/z
08049000-0804a000 rw-p 00001000 03:00 8312       /opt/z
0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
a7d12000-a7d13000 ---p 00000000 00:00 0
a7d13000-a7f13000 rw-p 00000000 00:00 0          [thread stack: 001ff4b4]
a7f13000-a7f14000 ---p 00000000 00:00 0
a7f14000-a7f36000 rw-p 00000000 00:00 0
a7f36000-a8069000 r-xp 00000000 03:00 4222       /lib/libc.so.6
a8069000-a806b000 r--p 00133000 03:00 4222       /lib/libc.so.6
a806b000-a806c000 rw-p 00135000 03:00 4222       /lib/libc.so.6
a806c000-a806f000 rw-p 00000000 00:00 0
a806f000-a8083000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
a8083000-a8084000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
a8084000-a8085000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
a8085000-a8088000 rw-p 00000000 00:00 0
a8088000-a80a4000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
a80a4000-a80a5000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
a80a5000-a80a6000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
afaf5000-afb0a000 rw-p 00000000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]

Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
which will you give the current stack usage in kb.

A sample output of /proc/self/status looks like:

Name:	cat
State:	R (running)
Tgid:	507
Pid:	507
.
.
.
CapBnd:	fffffffffffffeff
voluntary_ctxt_switches:	0
nonvoluntary_ctxt_switches:	0
Stack usage:	12 kB

I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the base
address of the associated thread stack and not the one of the main
process.  This makes more sense.

[akpm@linux-foundation.org: fs/proc/array.c now needs walk_page_range()]
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:41 -07:00
Linus Torvalds
a87e84b5cd Merge branch 'for-2.6.32' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
  nfsd4: nfsv4 clients should cross mountpoints
  nfsd: revise 4.1 status documentation
  sunrpc/cache: avoid variable over-loading in cache_defer_req
  sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
  nfsd: return success for non-NFS4 nfs4_state_start
  nfsd41: Refactor create_client()
  nfsd41: modify nfsd4.1 backchannel to use new xprt class
  nfsd41: Backchannel: Implement cb_recall over NFSv4.1
  nfsd41: Backchannel: cb_sequence callback
  nfsd41: Backchannel: Setup sequence information
  nfsd41: Backchannel: Server backchannel RPC wait queue
  nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
  nfsd41: Backchannel: callback infrastructure
  nfsd4: use common rpc_cred for all callbacks
  nfsd4: allow nfs4 state startup to fail
  SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
  nfsd4: fix null dereference creating nfsv4 callback client
  nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
  nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
  sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
  ...
2009-09-22 07:54:33 -07:00
Linus Torvalds
342ff1a1b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...
2009-09-22 07:51:45 -07:00
KOSAKI Motohiro
495789a51a oom: make oom_score to per-process value
oom-killer kills a process, not task.  Then oom_score should be calculated
as per-process too.  it makes consistency more and makes speed up
select_bad_process().

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:39 -07:00
Moussa A. Ba
398499d5f3 pagemap clear_refs: modify to specify anon or mapped vma clearing
The patch makes the clear_refs more versatile in adding the option to
select anonymous pages or file backed pages for clearing.  This addition
has a measurable impact on user space application performance as it
decreases the number of pagewalks in scenarios where one is only
interested in a specific type of page (anonymous or file mapped).

The patch adds anonymous and file backed filters to the clear_refs interface.

echo 1 > /proc/PID/clear_refs resets the bits on all pages
echo 2 > /proc/PID/clear_refs resets the bits on anonymous pages only
echo 3 > /proc/PID/clear_refs resets the bits on file backed pages only

Any other value is ignored

Signed-off-by: Moussa A. Ba <moussa.a.ba@gmail.com>
Signed-off-by: Jared E. Hulbert <jaredeh@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:33 -07:00
Eric Dumazet
c574358e8b proc: document `guest' column in /proc/stat
We added a new column in cpuX lines of /proc/stat, to show the amount of
time spent by a cpu servicing a guest, without updating
Documentation/filesystems/proc.txt

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:24 -07:00
J. Bruce Fields
285a0f00c2 nfsd: revise 4.1 status documentation
Some small updates, a caveat about the minorversion control interface,
and an attempt to put missing features in context.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-21 11:13:45 -04:00
Anand Gadiyar
411c940385 trivial: fix typo "for for" in multiple files
trivial: fix typo "for for" in multiple files

Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:54 +02:00
Jan Kara
1358870dea ext4: Update documentation about quota mount options
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-18 12:22:29 -04:00
Andi Kleen
2571873621 HWPOISON: Define a new error_remove_page address space op for async truncation
Truncating metadata pages is not safe right now before
we haven't audited all file systems.

To enable truncation only for data address space define
a new address_space callback error_remove_page.

This is used for memory_failure.c memory error handling.

This can be then set to truncate_inode_page()

This patch just defines the new operation and adds documentation.

Callers and users come in followon patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16 11:50:13 +02:00
Linus Torvalds
a9bbd210a4 Merge branch 'docs-next' of git://git.lwn.net/linux-2.6
* 'docs-next' of git://git.lwn.net/linux-2.6:
  Document the flex_array library.
  Doc: seq_file.txt fix wrong dd command example.
2009-09-14 17:52:32 -07:00
Linus Torvalds
33f1de6931 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Whitespace fixes
  GFS2: Remove unused sysfs file
  GFS2: Be extra careful about deallocating inodes
  GFS2: Remove no_formal_ino generating code
  GFS2: Rename eattr.[ch] as xattr.[ch]
  GFS2: Clean up of extended attribute support
  GFS2: Add explanation of extended attr on-disk format
  GFS2: Add "-o errors=panic|withdraw" mount options
  GFS2: jumping to wrong label?
  GFS2: free disk inode which is deleted by remote node -V2
  GFS2: Add a document explaining GFS2's uevents
  GFS2: Add sysfs link to device
  GFS2: Replace assertion with proper error handling
  GFS2: Improve error handling in inode allocation
  GFS2: Add some more info to uevents
  GFS2: Add online uevent to GFS2
2009-09-14 14:35:56 -07:00
Trond Myklebust
ab3bbaa8b2 Merge branch 'nfs-for-2.6.32' 2009-09-11 14:59:37 -04:00
Jesper Dangaard Brouer
e8188807b7 Doc: seq_file.txt fix wrong dd command example.
Small error in the "dd" command example, "out=" should be "of=".

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-09-10 14:33:35 -06:00
Theodore Ts'o
d0646f7b63 ext4: Remove journal_checksum mount option and enable it by default
There's no real cost for the journal checksum feature, and we should
make sure it is enabled all the time.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 12:50:43 -04:00
Linus Torvalds
cf481442f2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation pointers
  9p: remove unnecessary v9fses->options which duplicates the mount string
  net/9p: insulate the client against an invalid error code sent by a 9p server
  9p: Add missing cast for the error return value in v9fs_get_inode
  9p: Remove redundant inode uid/gid assignment
  9p: Fix possible regressions when ->get_sb fails.
  9p: Fix v9fs show_options
  9p: Fix possible memleak in v9fs_inode_from fid.
  9p: minor comment fixes
  9p: Fix possible inode leak in v9fs_get_inode.
  9p: Check for error in return value of v9fs_fid_add
2009-08-27 12:24:08 -07:00
Trond Myklebust
e571cbf1a4 NFS: Add a dns resolver for use with NFSv4 referrals and migration
The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client
from one server to another in order to support filesystem migration and
replication. For full protocol support, we need to add the ability to
convert a DNS host name into an IP address that we can feed to the RPC
client.

We'll reuse the sunrpc cache, now that it has been converted to work with
rpc_pipefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-19 18:22:15 -04:00
Anton Blanchard
0dc9aa845c AFS: Documentation updates
Fix some issues with the AFS documentation, found when testing AFS on ppc64:

- Update AFS features: reading/writing, local caching
- Typo in kafs sysfs debug file
- Use modprobe instead of insmod in example
- Update IPs for grand.central.org

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-19 10:40:13 -07:00
KOSAKI Motohiro
0753ba01e1 mm: revert "oom: move oom_adj value"
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-18 16:31:13 -07:00
Eric Van Hensbergen
7815f4be40 9p: update documentation pointers
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-17 16:49:44 -05:00
Steven Whitehouse
0aa8744584 GFS2: Add a document explaining GFS2's uevents
This will be essential reading for anybody who wants to
understand how GFS2 interacts with the userland gfs_controld,
and the details of recovery.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2009-08-17 11:11:41 +01:00
Paul Wise
955234755c vfat: change the default from shortname=lower to shortname=mixed
Because, with "shortname=lower", copying one FAT filesystem tree to
another FAT filesystem tree using Linux results in semantically
different filesystems. (E.g.: Filenames which were once "all
uppercase" are now "all lowercase").

So, this changes the default of "shortname=lower" to "shortname=mixed".

Signed-off-by: Paul Wise <pabs3@bonedaddy.net>
[change fat_show_options()]
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2009-08-01 21:35:25 +09:00
Lucian Adrian Grijincu
a39ea210ec driver core: documentation: make it clear that sysfs is optional
The original text suggested that sysfs is mandatory and always
compiled in the kernel.

Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-28 13:45:23 -07:00
Christoph Hellwig
7e325d3a6b update Documentation/filesystems/Locking
The rules for locking in many superblock operations has changed
significantly, so update the documentation for it.  Also correct some
older updates and ommissions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-24 08:15:25 -04:00
Linus Torvalds
7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
Linus Torvalds
0732f87761 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: clean up jbd2_journal_try_to_free_buffers()
  ext4: Don't update ctime for non-extent-mapped inodes
  ext4: Fix up whitespace issues in fs/ext4/inode.c
  ext4: Fix 64-bit block type problem on 32-bit platforms
  ext4: teach the inode allocator to use a goal inode number
  ext4: Use a hash of the topdir directory name for the Orlov parent group
  ext4: document the "abort" mount option
  ext4: move the abort flag from s_mount_opts to s_mount_flags
  ext4: update the s_last_mounted field in the superblock
  ext4: change s_mount_opt to be an unsigned int
  ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctl
  ext4: avoid unnecessary spinlock in critical POSIX ACL path
  ext3: avoid unnecessary spinlock in critical POSIX ACL path
  ext4: convert instrumentation from markers to tracepoints
  jbd2: convert instrumentation from markers to tracepoints
2009-06-18 14:07:46 -07:00
Jan Kara
52b680c812 isofs: let mode and dmode mount options override rock ridge mode setting
So far, permissions set via 'mode' and/or 'dmode' mount options were
effective only if the medium had no rock ridge extensions (or was mounted
without them).  Add 'overriderockmode' mount option to indicate that these
options should override permissions set in rock ridge extensions.  Maybe
this should be default but the current behavior is there since mount
options were created so I think we should not change how they behave.

Cc: <Hans-Joachim.Baader@cjt.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:45 -07:00
Michael Shields
ce05b2a9db Doc fix: ext2 can only have 32,000 subdirs, not 32,768
ext2.txt says that dirs can have 32,768 subdirs, but the actual value of
EXT2_LINK_MAX is 32000.

ext3 is the same, but the doc does not mention it.  One of ext4's features
is to "fix 32000 subdirectory limit".

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:44 -07:00
Stefani Seibold
349888ee7b proc.txt: update kernel filesystem/proc.txt documentation
An update for the "Process-Specific Subdirectories" section to reflect the
changes till kernel 2.6.30.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
Keika Kobayashi
d3d64df21d proc: export statistics for softirq to /proc
Export statistics for softirq in /proc/softirqs and /proc/stat.

1. /proc/softirqs
Implement /proc/softirqs which shows the number of softirq
for each CPU like /proc/interrupts.

2. /proc/stat
Add the "softirq" line to /proc/stat.
This line shows the number of softirq for all cpu.
The first column is the total of all softirqs and
each subsequent column is the total for particular softirq.

[kosaki.motohiro@jp.fujitsu.com: remove redundant for_each_possible_cpu() loop]
Signed-off-by: Keika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Reviewed-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:41 -07:00
Linus Torvalds
feb72ce827 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  get rid of BKL in fs/sysv
  get rid of BKL in fs/minix
  get rid of BKL in fs/efs
  befs ->pust_super() doesn't need BKL
  Cleanup of adfs headers
  9P doesn't need BKL in ->umount_begin()
  fuse doesn't need BKL in ->umount_begin()
  No instance of ->bmap() needs BKL
  remove unlock_kernel() left accidentally
  ext4: avoid unnecessary spinlock in critical POSIX ACL path
  ext3: avoid unnecessary spinlock in critical POSIX ACL path
2009-06-17 08:46:57 -07:00
Al Viro
fe36adf47e No instance of ->bmap() needs BKL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-17 00:36:35 -04:00
Linus Torvalds
517d08699b Merge branch 'akpm'
* akpm: (182 commits)
  fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
  fbdev: *bfin*: fix __dev{init,exit} markings
  fbdev: *bfin*: drop unnecessary calls to memset
  fbdev: bfin-t350mcqb-fb: drop unused local variables
  fbdev: blackfin has __raw I/O accessors, so use them in fb.h
  fbdev: s1d13xxxfb: add accelerated bitblt functions
  tcx: use standard fields for framebuffer physical address and length
  fbdev: add support for handoff from firmware to hw framebuffers
  intelfb: fix a bug when changing video timing
  fbdev: use framebuffer_release() for freeing fb_info structures
  radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
  s3c-fb: CPUFREQ frequency scaling support
  s3c-fb: fix resource releasing on error during probing
  carminefb: fix possible access beyond end of carmine_modedb[]
  acornfb: remove fb_mmap function
  mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
  mb862xxfb: restrict compliation of platform driver to PPC
  Samsung SoC Framebuffer driver: add Alpha Channel support
  atmel-lcdc: fix pixclock upper bound detection
  offb: use framebuffer_alloc() to allocate fb_info struct
  ...

Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16 19:50:13 -07:00
David Rientjes
2ff05b2b4e oom: move oom_adj value from task_struct to mm_struct
The per-task oom_adj value is a characteristic of its mm more than the
task itself since it's not possible to oom kill any thread that shares the
mm.  If a task were to be killed while attached to an mm that could not be
freed because another thread were set to OOM_DISABLE, it would have
needlessly been terminated since there is no potential for future memory
freeing.

This patch moves oomkilladj (now more appropriately named oom_adj) from
struct task_struct to struct mm_struct.  This requires task_lock() on a
task to check its oom_adj value to protect against exec, but it's already
necessary to take the lock when dereferencing the mm to find the total VM
size for the badness heuristic.

This fixes a livelock if the oom killer chooses a task and another thread
sharing the same memory has an oom_adj value of OOM_DISABLE.  This occurs
because oom_kill_task() repeatedly returns 1 and refuses to kill the
chosen task while select_bad_process() will repeatedly choose the same
task during the next retry.

Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
oom_kill_task() to check for threads sharing the same memory will be
removed in the next patch in this series where it will no longer be
necessary.

Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
these threads are immune from oom killing already.  They simply report an
oom_adj value of OOM_DISABLE.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:43 -07:00
Linus Torvalds
23059a0df5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
  fat: split fat_generic_ioctl
  FAT: add 'errors' mount option
2009-06-16 11:29:44 -07:00
J. Bruce Fields
7eef4091a6 Merge commit 'v2.6.30' into for-2.6.31 2009-06-15 18:08:07 -07:00
Linus Torvalds
9c7cb99a82 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
  nilfs2: support contiguous lookup of blocks
  nilfs2: add sync_page method to page caches of meta data
  nilfs2: use device's backing_dev_info for btree node caches
  nilfs2: return EBUSY against delete request on snapshot
  nilfs2: modify list of unsupported features in caveats
  nilfs2: enable sync_page method
  nilfs2: set bio unplug flag for the last bio in segment
  nilfs2: allow future expansion of metadata read out via get info ioctl
  NILFS2: Pagecache usage optimization on NILFS2
  nilfs2: remove nilfs_btree_operations from btree mapping
  nilfs2: remove nilfs_direct_operations from direct mapping
  nilfs2: remove bmap pointer operations
  nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
  nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
  nilfs2: move get block functions in bmap.c into btree codes
  nilfs2: remove nilfs_bmap_delete_block
  nilfs2: remove nilfs_bmap_put_block
  nilfs2: remove header file for segment list operations
  nilfs2: eliminate removal list of segments
  nilfs2: add sufile function that can modify multiple segment usages
  ...
2009-06-15 09:13:49 -07:00
Linus Torvalds
489f7ab6c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits)
  trivial: remove the trivial patch monkey's name from SubmittingPatches
  trivial: Fix a typo in comment of addrconf_dad_start()
  trivial: usb: fix missing space typo in doc
  trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug
  trivial: Remove the hyphen from git commands
  trivial: fix ETIMEOUT -> ETIMEDOUT typos
  trivial: Kconfig: .ko is normally not included in module names
  trivial: SubmittingPatches: fix typo
  trivial: Documentation/dell_rbu.txt: fix typos
  trivial: Fix Pavel's address in MAINTAINERS
  trivial: ftrace:fix description of trace directory
  trivial: unnecessary (void*) cast removal in sound/oss/msnd.c
  trivial: input/misc: Fix typo in Kconfig
  trivial: fix grammo in bus_for_each_dev() kerneldoc
  trivial: rbtree.txt: fix rb_entry() parameters in sample code
  trivial: spelling fix in ppc code comments
  trivial: fix typo in bio_alloc kernel doc
  trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt
  trivial: Miscellaneous documentation typo fixes
  trivial: fix typo milisecond/millisecond for documentation and source comments.
  ...
2009-06-14 13:46:25 -07:00
Linus Torvalds
1904187a69 Merge branch 'docs-next' of git://git.lwn.net/linux-2.6
* 'docs-next' of git://git.lwn.net/linux-2.6:
  Document the debugfs API
  Documentation: Add "how to write a good patch summary" to SubmittingPatches
  SubmittingPatches: fix typo
  docs: Encourage better changelogs in the development process document
  Document Reported-by in SubmittingPatches
2009-06-13 13:08:34 -07:00
Theodore Ts'o
8a8a2050c8 ext4: document the "abort" mount option
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-13 10:08:59 -04:00
Matt LaPlante
19f5946001 trivial: Miscellaneous documentation typo fixes
Fix various typos in documentation txts.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:47 +02:00
Linus Torvalds
0a33f80a83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (25 commits)
  GFS2: Merge gfs2_get_sb into gfs2_get_sb_meta
  GFS2: Fix cache coherency between truncate and O_DIRECT read
  GFS2: Fix locking issue mounting gfs2meta fs
  GFS2: Remove unused variable
  GFS2: smbd proccess hangs with flock() call.
  GFS2: Remove args subdir from gfs2 sysfs files
  GFS2: Remove lockstruct subdir from gfs2 sysfs files
  GFS2: Move gfs2_unlink_ok into ops_inode.c
  GFS2: Move gfs2_readlinki into ops_inode.c
  GFS2: Move gfs2_rmdiri into ops_inode.c
  GFS2: Merge mount.c and ops_super.c into super.c
  GFS2: Clean up some file names
  GFS2: Be more aggressive in reclaiming unlinked inodes
  GFS2: Add a rgrp bitmap full flag
  GFS2: Improve resource group error handling
  GFS2: Don't warn when delete inode fails on ro filesystem
  GFS2: Update docs
  GFS2: Umount recovery race fix
  GFS2: Remove a couple of unused sysfs entries
  GFS2: Add commit= mount option
  ...
2009-06-11 10:36:12 -07:00
Ryusuke Konishi
fb6e7113ae nilfs2: modify list of unsupported features in caveats
This clarifies missing features of nilfs as a regular filesystem.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-06-10 23:41:11 +09:00
Jonathan Corbet
f89d7eaf6c Document the debugfs API
This is an updated document covering the internal API for the debugfs
filesystem.  Thanks to Shen Feng for suggesting that I put this text here
and noting that the old LWN version was rather out of date.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Reported-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-06-06 10:28:14 -06:00
Denis Karpov
85c7859190 FAT: add 'errors' mount option
On severe errors FAT remounts itself in read-only mode. Allow to
specify FAT fs desired behavior through 'errors' mount option:
panic, continue or remount read-only.

`mount -t [fat|vfat] -o errors=[panic,remount-ro,continue] \
	<bdev> <mount point>`

This is analog to ext2 fs 'errors' mount option.

Signed-off-by: Denis Karpov <ext-denis.2.karpov@nokia.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
2009-06-04 02:34:51 +09:00
Hugh Dickins
98f32602d4 hugh: update email address
My old address will shut down in a few days time: remove it from the tree,
and add a tmpfs (shmem filesystem) maintainer entry with the new address.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-21 13:14:32 -07:00
Steven Whitehouse
e9ccb73ab5 GFS2: Update docs
Update a few things which were out of date, and fix a typo.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-05-19 10:23:23 +01:00
Nick Piggin
b827e496c8 mm: close page_mkwrite races
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil <sage@newdream.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-02 15:36:09 -07:00
Benny Halevy
4ba170c2bb update Documentation/filesystems/00-INDEX with new nfsd related docs.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: James Lentini <jlentini@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-28 12:54:45 -04:00
Marc Dionne
91ac033d83 CacheFiles: Fix the documentation to use the correct credential pointer names
Adjust the CacheFiles documentation to use the correct names of the credential
pointers in task_struct.

The documentation was using names from the old versions of the credentials
patches.

Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24 13:28:30 -07:00
Adrian McMenamin
66672fefaa Documentation/filesystems: remove out of date reference to BKL being held
Documentation/filesystems/vfs.txt incorrectly states that the kernel is
locked during the call to statfs (Documentation/filesystems/Locking
correctly says it is not). This patch removes the offending sentence.

remove reference to BKL being held in statfs

Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-04-20 23:01:16 -04:00
Evgeniy Polyakov
e0ca873916 Staging: Pohmelfs: Added IO permissions and priorities.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-17 11:06:30 -07:00
Ryusuke Konishi
458c5b0822 nilfs2: clean up sketch file
The sketch file is a file to mark checkpoints with user data.  It was
experimentally introduced in the original implementation, and now
obsolete.  The file was handled differently with regular files; the file
size got truncated when a checkpoint was created.

This stops the special treatment and will treat it as a regular file.
Most users are not affected because mkfs.nilfs2 no longer makes this file.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:19 -07:00
Ryusuke Konishi
962281a7ab nilfs2: add document
This adds a document describing the features, mount options, userland
tools, usage, disk format, and related URLs for the nilfs2 file system.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 08:31:12 -07:00
Linus Torvalds
a63856252d Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits)
  nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4
  nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc
  nfsd41: Documentation/filesystems/nfs41-server.txt
  nfsd41: CREATE_EXCLUSIVE4_1
  nfsd41: SUPPATTR_EXCLCREAT attribute
  nfsd41: support for 3-word long attribute bitmask
  nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify
  nfsd41: pass writable attrs mask to nfsd4_decode_fattr
  nfsd41: provide support for minor version 1 at rpc level
  nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions
  nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap
  nfsd41: access_valid
  nfsd41: clientid handling
  nfsd41: check encode size for sessions maxresponse cached
  nfsd41: stateid handling
  nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op
  nfsd41: destroy_session operation
  nfsd41: non-page DRC for solo sequence responses
  nfsd41: Add a create session replay cache
  nfsd41: create_session operation
  ...
2009-04-06 13:25:56 -07:00
Linus Torvalds
3516c6a8dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (714 commits)
  Staging: sxg: slicoss: Specify the license for Sahara SXG and Slicoss drivers
  Staging: serqt_usb: fix build due to proc tty changes
  Staging: serqt_usb: fix checkpatch errors
  Staging: serqt_usb: add TODO file
  Staging: serqt_usb: Lindent the code
  Staging: add USB serial Quatech driver
  staging: document that the wifi staging drivers a bit better
  Staging: echo cleanup
  Staging: BUG to BUG_ON changes
  Staging: remove some pointless conditionals before kfree_skb()
  Staging: line6: fix build error, select SND_RAWMIDI
  Staging: line6: fix checkpatch errors in variax.c
  Staging: line6: fix checkpatch errors in toneport.c
  Staging: line6: fix checkpatch errors in pcm.c
  Staging: line6: fix checkpatch errors in midibuf.c
  Staging: line6: fix checkpatch errors in midi.c
  Staging: line6: fix checkpatch errors in dumprequest.c
  Staging: line6: fix checkpatch errors in driver.c
  Staging: line6: fix checkpatch errors in audio.c
  Staging: line6: fix checkpatch errors in pod.c
  ...
2009-04-05 11:06:45 -07:00
Benny Halevy
3ef1728898 nfsd41: Documentation/filesystems/nfs41-server.txt
Initial nfs41 server write up describing the status of the linux
server implementation.

[nfsd41: document unenforced nfs41 compound ordering rules.]
[get rid of CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03 17:41:24 -07:00
Linus Torvalds
811158b147 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
  trivial: Update my email address
  trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
  trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
  trivial: Fix misspelling of "Celsius".
  trivial: remove unused variable 'path' in alloc_file()
  trivial: fix a pdlfush -> pdflush typo in comment
  trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
  trivial: wusb: Storage class should be before const qualifier
  trivial: drivers/char/bsr.c: Storage class should be before const qualifier
  trivial: h8300: Storage class should be before const qualifier
  trivial: fix where cgroup documentation is not correctly referred to
  trivial: Give the right path in Documentation example
  trivial: MTD: remove EOL from MODULE_DESCRIPTION
  trivial: Fix typo in bio_split()'s documentation
  trivial: PWM: fix of #endif comment
  trivial: fix typos/grammar errors in Kconfig texts
  trivial: Fix misspelling of firmware
  trivial: cgroups: documentation typo and spelling corrections
  trivial: Update contact info for Jochen Hein
  trivial: fix typo "resgister" -> "register"
  ...
2009-04-03 15:24:35 -07:00
Evgeniy Polyakov
b8523c40d5 Staging: pohmelfs: documentation.
This patch includes POHMELFS design and implementation description.
Separate file includes mount options, default parameters and usage examples.

Signed-off-by: Eveniy Polyakov <zbr@ioremap.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-04-03 14:53:33 -07:00
Linus Torvalds
3cc50ac0db Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
  NFS: Add mount options to enable local caching on NFS
  NFS: Display local caching state
  NFS: Store pages from an NFS inode into a local cache
  NFS: Read pages from FS-Cache into an NFS inode
  NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
  NFS: Add read context retention for FS-Cache to call back with
  NFS: FS-Cache page management
  NFS: Add some new I/O counters for FS-Cache doing things for NFS
  NFS: Invalidate FsCache page flags when cache removed
  NFS: Use local disk inode cache
  NFS: Define and create inode-level cache objects
  NFS: Define and create superblock-level objects
  NFS: Define and create server-level objects
  NFS: Register NFS for caching and retrieve the top-level index
  NFS: Permit local filesystem caching to be enabled for NFS
  NFS: Add FS-Cache option bit and debug bit
  NFS: Add comment banners to some NFS functions
  FS-Cache: Make kAFS use FS-Cache
  CacheFiles: A cache that backs onto a mounted filesystem
  CacheFiles: Export things for CacheFiles
  ...
2009-04-03 10:07:43 -07:00
Linus Torvalds
9b59f0316b Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  fs: Add exofs to Kernel build
  exofs: Documentation
  exofs: export_operations
  exofs: super_operations and file_system_type
  exofs: dir_inode and directory operations
  exofs: address_space_operations
  exofs: symlink_inode and fast_symlink_inode operations
  exofs: file and file_inode operations
  exofs: Kbuild, Headers and osd utils
2009-04-03 09:53:22 -07:00
Linus Torvalds
03c3fa0a3b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: Don't write integrity descriptor too often
  udf: Try anchor in block 256 first
  udf: Some type fixes and cleanups
  udf: use hardware sector size
  udf: fix novrs mount option
  udf: Fix oops when invalid character in filename occurs
  udf: return f_fsid for statfs(2)
  udf: Add checks to not underflow sector_t
  udf: fix default mode and dmode options handling
  udf: fix sparse warnings:
  udf: unsigned last[i] cannot be less than 0
  udf: implement mode and dmode mounting options
  udf: reduce stack usage of udf_get_filename
  udf: reduce stack usage of udf_load_pvoldesc
  Fix the udf code not to pass structs on stack where possible.
  Remove struct typedefs from fs/udf/ecma_167.h et al.
2009-04-03 09:50:39 -07:00
David Howells
9ae326a690 CacheFiles: A cache that backs onto a mounted filesystem
Add an FS-Cache cache-backend that permits a mounted filesystem to be used as a
backing store for the cache.

CacheFiles uses a userspace daemon to do some of the cache management - such as
reaping stale nodes and culling.  This is called cachefilesd and lives in
/sbin.  The source for the daemon can be downloaded from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.c

And an example configuration from:

	http://people.redhat.com/~dhowells/cachefs/cachefilesd.conf

The filesystem and data integrity of the cache are only as good as those of the
filesystem providing the backing services.  Note that CacheFiles does not
attempt to journal anything since the journalling interfaces of the various
filesystems are very specific in nature.

CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
to communication with the daemon.  Only one thing may have this open at once,
and whilst it is open, a cache is at least partially in existence.  The daemon
opens this and sends commands down it to control the cache.

CacheFiles is currently limited to a single cache.

CacheFiles attempts to maintain at least a certain percentage of free space on
the filesystem, shrinking the cache by culling the objects it contains to make
space if necessary - see the "Cache Culling" section.  This means it can be
placed on the same medium as a live set of data, and will expand to make use of
spare space and automatically contract when the set of data requires more
space.

============
REQUIREMENTS
============

The use of CacheFiles and its daemon requires the following features to be
available in the system and in the cache filesystem:

	- dnotify.

	- extended attributes (xattrs).

	- openat() and friends.

	- bmap() support on files in the filesystem (FIBMAP ioctl).

	- The use of bmap() to detect a partial page at the end of the file.

It is strongly recommended that the "dir_index" option is enabled on Ext3
filesystems being used as a cache.

=============
CONFIGURATION
=============

The cache is configured by a script in /etc/cachefilesd.conf.  These commands
set up cache ready for use.  The following script commands are available:

 (*) brun <N>%
 (*) bcull <N>%
 (*) bstop <N>%
 (*) frun <N>%
 (*) fcull <N>%
 (*) fstop <N>%

	Configure the culling limits.  Optional.  See the section on culling
	The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.

	The commands beginning with a 'b' are file space (block) limits, those
	beginning with an 'f' are file count limits.

 (*) dir <path>

	Specify the directory containing the root of the cache.  Mandatory.

 (*) tag <name>

	Specify a tag to FS-Cache to use in distinguishing multiple caches.
	Optional.  The default is "CacheFiles".

 (*) debug <mask>

	Specify a numeric bitmask to control debugging in the kernel module.
	Optional.  The default is zero (all off).  The following values can be
	OR'd into the mask to collect various information:

		1	Turn on trace of function entry (_enter() macros)
		2	Turn on trace of function exit (_leave() macros)
		4	Turn on trace of internal debug points (_debug())

	This mask can also be set through sysfs, eg:

		echo 5 >/sys/modules/cachefiles/parameters/debug

==================
STARTING THE CACHE
==================

The cache is started by running the daemon.  The daemon opens the cache device,
configures the cache and tells it to begin caching.  At that point the cache
binds to fscache and the cache becomes live.

The daemon is run as follows:

	/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]

The flags are:

 (*) -d

	Increase the debugging level.  This can be specified multiple times and
	is cumulative with itself.

 (*) -s

	Send messages to stderr instead of syslog.

 (*) -n

	Don't daemonise and go into background.

 (*) -f <configfile>

	Use an alternative configuration file rather than the default one.

===============
THINGS TO AVOID
===============

Do not mount other things within the cache as this will cause problems.  The
kernel module contains its own very cut-down path walking facility that ignores
mountpoints, but the daemon can't avoid them.

Do not create, rename or unlink files and directories in the cache whilst the
cache is active, as this may cause the state to become uncertain.

Renaming files in the cache might make objects appear to be other objects (the
filename is part of the lookup key).

Do not change or remove the extended attributes attached to cache files by the
cache as this will cause the cache state management to get confused.

Do not create files or directories in the cache, lest the cache get confused or
serve incorrect data.

Do not chmod files in the cache.  The module creates things with minimal
permissions to prevent random users being able to access them directly.

=============
CACHE CULLING
=============

The cache may need culling occasionally to make space.  This involves
discarding objects from the cache that have been used less recently than
anything else.  Culling is based on the access time of data objects.  Empty
directories are culled if not in use.

Cache culling is done on the basis of the percentage of blocks and the
percentage of files available in the underlying filesystem.  There are six
"limits":

 (*) brun
 (*) frun

     If the amount of free space and the number of available files in the cache
     rises above both these limits, then culling is turned off.

 (*) bcull
 (*) fcull

     If the amount of available space or the number of available files in the
     cache falls below either of these limits, then culling is started.

 (*) bstop
 (*) fstop

     If the amount of available space or the number of available files in the
     cache falls below either of these limits, then no further allocation of
     disk space or files is permitted until culling has raised things above
     these limits again.

These must be configured thusly:

	0 <= bstop < bcull < brun < 100
	0 <= fstop < fcull < frun < 100

Note that these are percentages of available space and available files, and do
_not_ appear as 100 minus the percentage displayed by the "df" program.

The userspace daemon scans the cache to build up a table of cullable objects.
These are then culled in least recently used order.  A new scan of the cache is
started as soon as space is made in the table.  Objects will be skipped if
their atimes have changed or if the kernel module says it is still using them.

===============
CACHE STRUCTURE
===============

The CacheFiles module will create two directories in the directory it was
given:

 (*) cache/

 (*) graveyard/

The active cache objects all reside in the first directory.  The CacheFiles
kernel module moves any retired or culled objects that it can't simply unlink
to the graveyard from which the daemon will actually delete them.

The daemon uses dnotify to monitor the graveyard directory, and will delete
anything that appears therein.

The module represents index objects as directories with the filename "I..." or
"J...".  Note that the "cache/" directory is itself a special index.

Data objects are represented as files if they have no children, or directories
if they do.  Their filenames all begin "D..." or "E...".  If represented as a
directory, data objects will have a file in the directory called "data" that
actually holds the data.

Special objects are similar to data objects, except their filenames begin
"S..." or "T...".

If an object has children, then it will be represented as a directory.
Immediately in the representative directory are a collection of directories
named for hash values of the child object keys with an '@' prepended.  Into
this directory, if possible, will be placed the representations of the child
objects:

	INDEX     INDEX      INDEX                             DATA FILES
	========= ========== ================================= ================
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
	cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry

If the key is so long that it exceeds NAME_MAX with the decorations added on to
it, then it will be cut into pieces, the first few of which will be used to
make a nest of directories, and the last one of which will be the objects
inside the last directory.  The names of the intermediate directories will have
'+' prepended:

	J1223/@23/+xy...z/+kl...m/Epqr

Note that keys are raw data, and not only may they exceed NAME_MAX in size,
they may also contain things like '/' and NUL characters, and so they may not
be suitable for turning directly into a filename.

To handle this, CacheFiles will use a suitably printable filename directly and
"base-64" encode ones that aren't directly suitable.  The two versions of
object filenames indicate the encoding:

	OBJECT TYPE	PRINTABLE	ENCODED
	===============	===============	===============
	Index		"I..."		"J..."
	Data		"D..."		"E..."
	Special		"S..."		"T..."

Intermediate directories are always "@" or "+" as appropriate.

Each object in the cache has an extended attribute label that holds the object
type ID (required to distinguish special objects) and the auxiliary data from
the netfs.  The latter is used to detect stale objects in the cache and update
or retire them.

Note that CacheFiles will erase from the cache any file it doesn't recognise or
any file of an incorrect type (such as a FIFO file or a device file).

==========================
SECURITY MODEL AND SELINUX
==========================

CacheFiles is implemented to deal properly with the LSM security features of
the Linux kernel and the SELinux facility.

One of the problems that CacheFiles faces is that it is generally acting on
behalf of a process, and running in that process's context, and that includes a
security context that is not appropriate for accessing the cache - either
because the files in the cache are inaccessible to that process, or because if
the process creates a file in the cache, that file may be inaccessible to other
processes.

The way CacheFiles works is to temporarily change the security context (fsuid,
fsgid and actor security label) that the process acts as - without changing the
security context of the process when it the target of an operation performed by
some other process (so signalling and suchlike still work correctly).

When the CacheFiles module is asked to bind to its cache, it:

 (1) Finds the security label attached to the root cache directory and uses
     that as the security label with which it will create files.  By default,
     this is:

	cachefiles_var_t

 (2) Finds the security label of the process which issued the bind request
     (presumed to be the cachefilesd daemon), which by default will be:

	cachefilesd_t

     and asks LSM to supply a security ID as which it should act given the
     daemon's label.  By default, this will be:

	cachefiles_kernel_t

     SELinux transitions the daemon's security ID to the module's security ID
     based on a rule of this form in the policy.

	type_transition <daemon's-ID> kernel_t : process <module's-ID>;

     For instance:

	type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;

The module's security ID gives it permission to create, move and remove files
and directories in the cache, to find and access directories and files in the
cache, to set and access extended attributes on cache objects, and to read and
write files in the cache.

The daemon's security ID gives it only a very restricted set of permissions: it
may scan directories, stat files and erase files and directories.  It may
not read or write files in the cache, and so it is precluded from accessing the
data cached therein; nor is it permitted to create new files in the cache.

There are policy source files available in:

	http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2

and later versions.  In that tarball, see the files:

	cachefilesd.te
	cachefilesd.fc
	cachefilesd.if

They are built and installed directly by the RPM.

If a non-RPM based system is being used, then copy the above files to their own
directory and run:

	make -f /usr/share/selinux/devel/Makefile
	semodule -i cachefilesd.pp

You will need checkpolicy and selinux-policy-devel installed prior to the
build.

By default, the cache is located in /var/fscache, but if it is desirable that
it should be elsewhere, than either the above policy files must be altered, or
an auxiliary policy must be installed to label the alternate location of the
cache.

For instructions on how to add an auxiliary policy to enable the cache to be
located elsewhere when SELinux is in enforcing mode, please see:

	/usr/share/doc/cachefilesd-*/move-cache.txt

When the cachefilesd rpm is installed; alternatively, the document can be found
in the sources.

==================
A NOTE ON SECURITY
==================

CacheFiles makes use of the split security in the task_struct.  It allocates
its own task_security structure, and redirects current->act_as to point to it
when it acts on behalf of another process, in that process's context.

The reason it does this is that it calls vfs_mkdir() and suchlike rather than
bypassing security and calling inode ops directly.  Therefore the VFS and LSM
may deny the CacheFiles access to the cache data because under some
circumstances the caching code is running in the security context of whatever
process issued the original syscall on the netfs.

Furthermore, should CacheFiles create a file or directory, the security
parameters with that object is created (UID, GID, security label) would be
derived from that process that issued the system call, thus potentially
preventing other processes from accessing the cache - including CacheFiles's
cache management daemon (cachefilesd).

What is required is to temporarily override the security of the process that
issued the system call.  We can't, however, just do an in-place change of the
security data as that affects the process as an object, not just as a subject.
This means it may lose signals or ptrace events for example, and affects what
the process looks like in /proc.

So CacheFiles makes use of a logical split in the security between the
objective security (task->sec) and the subjective security (task->act_as).  The
objective security holds the intrinsic security properties of a process and is
never overridden.  This is what appears in /proc, and is what is used when a
process is the target of an operation by some other process (SIGKILL for
example).

The subjective security holds the active security properties of a process, and
may be overridden.  This is not seen externally, and is used whan a process
acts upon another object, for example SIGKILLing another process or opening a
file.

LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
for CacheFiles to run in a context of a specific security label, or to create
files and directories with another security label.

This documentation is added by the patch to:

	Documentation/filesystems/caching/cachefiles.txt

Signed-Off-By: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:41 +01:00
David Howells
952efe7b78 FS-Cache: Add and document asynchronous operation handling
Add and document asynchronous operation handling for use by FS-Cache's data
storage and retrieval routines.

The following documentation is added to:

	Documentation/filesystems/caching/operations.txt

		       ================================
		       ASYNCHRONOUS OPERATIONS HANDLING
		       ================================

========
OVERVIEW
========

FS-Cache has an asynchronous operations handling facility that it uses for its
data storage and retrieval routines.  Its operations are represented by
fscache_operation structs, though these are usually embedded into some other
structure.

This facility is available to and expected to be be used by the cache backends,
and FS-Cache will create operations and pass them off to the appropriate cache
backend for completion.

To make use of this facility, <linux/fscache-cache.h> should be #included.

===============================
OPERATION RECORD INITIALISATION
===============================

An operation is recorded in an fscache_operation struct:

	struct fscache_operation {
		union {
			struct work_struct fast_work;
			struct slow_work slow_work;
		};
		unsigned long		flags;
		fscache_operation_processor_t processor;
		...
	};

Someone wanting to issue an operation should allocate something with this
struct embedded in it.  They should initialise it by calling:

	void fscache_operation_init(struct fscache_operation *op,
				    fscache_operation_release_t release);

with the operation to be initialised and the release function to use.

The op->flags parameter should be set to indicate the CPU time provision and
the exclusivity (see the Parameters section).

The op->fast_work, op->slow_work and op->processor flags should be set as
appropriate for the CPU time provision (see the Parameters section).

FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
operation and waited for afterwards.

==========
PARAMETERS
==========

There are a number of parameters that can be set in the operation record's flag
parameter.  There are three options for the provision of CPU time in these
operations:

 (1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD).  A thread
     may decide it wants to handle an operation itself without deferring it to
     another thread.

     This is, for example, used in read operations for calling readpages() on
     the backing filesystem in CacheFiles.  Although readpages() does an
     asynchronous data fetch, the determination of whether pages exist is done
     synchronously - and the netfs does not proceed until this has been
     determined.

     If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
     before submitting the operation, and the operating thread must wait for it
     to be cleared before proceeding:

		wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
			    fscache_wait_bit, TASK_UNINTERRUPTIBLE);

 (2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
     will be given to keventd to process.  Such an operation is not permitted
     to sleep on I/O.

     This is, for example, used by CacheFiles to copy data from a backing fs
     page to a netfs page after the backing fs has read the page in.

     If this option is used, op->fast_work and op->processor must be
     initialised before submitting the operation:

		INIT_WORK(&op->fast_work, do_some_work);

 (3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
     will be given to the slow work facility to process.  Such an operation is
     permitted to sleep on I/O.

     This is, for example, used by FS-Cache to handle background writes of
     pages that have just been fetched from a remote server.

     If this option is used, op->slow_work and op->processor must be
     initialised before submitting the operation:

		fscache_operation_init_slow(op, processor)

Furthermore, operations may be one of two types:

 (1) Exclusive (FSCACHE_OP_EXCLUSIVE).  Operations of this type may not run in
     conjunction with any other operation on the object being operated upon.

     An example of this is the attribute change operation, in which the file
     being written to may need truncation.

 (2) Shareable.  Operations of this type may be running simultaneously.  It's
     up to the operation implementation to prevent interference between other
     operations running at the same time.

=========
PROCEDURE
=========

Operations are used through the following procedure:

 (1) The submitting thread must allocate the operation and initialise it
     itself.  Normally this would be part of a more specific structure with the
     generic op embedded within.

 (2) The submitting thread must then submit the operation for processing using
     one of the following two functions:

	int fscache_submit_op(struct fscache_object *object,
			      struct fscache_operation *op);

	int fscache_submit_exclusive_op(struct fscache_object *object,
					struct fscache_operation *op);

     The first function should be used to submit non-exclusive ops and the
     second to submit exclusive ones.  The caller must still set the
     FSCACHE_OP_EXCLUSIVE flag.

     If successful, both functions will assign the operation to the specified
     object and return 0.  -ENOBUFS will be returned if the object specified is
     permanently unavailable.

     The operation manager will defer operations on an object that is still
     undergoing lookup or creation.  The operation will also be deferred if an
     operation of conflicting exclusivity is in progress on the object.

     If the operation is asynchronous, the manager will retain a reference to
     it, so the caller should put their reference to it by passing it to:

	void fscache_put_operation(struct fscache_operation *op);

 (3) If the submitting thread wants to do the work itself, and has marked the
     operation with FSCACHE_OP_MYTHREAD, then it should monitor
     FSCACHE_OP_WAITING as described above and check the state of the object if
     necessary (the object might have died whilst the thread was waiting).

     When it has finished doing its processing, it should call
     fscache_put_operation() on it.

 (4) The operation holds an effective lock upon the object, preventing other
     exclusive ops conflicting until it is released.  The operation can be
     enqueued for further immediate asynchronous processing by adjusting the
     CPU time provisioning option if necessary, eg:

	op->flags &= ~FSCACHE_OP_TYPE;
	op->flags |= ~FSCACHE_OP_FAST;

     and calling:

	void fscache_enqueue_operation(struct fscache_operation *op)

     This can be used to allow other things to have use of the worker thread
     pools.

=====================
ASYNCHRONOUS CALLBACK
=====================

When used in asynchronous mode, the worker thread pool will invoke the
processor method with a pointer to the operation.  This should then get at the
container struct by using container_of():

	static void fscache_write_op(struct fscache_operation *_op)
	{
		struct fscache_storage *op =
			container_of(_op, struct fscache_storage, op);
	...
	}

The caller holds a reference on the operation, and will invoke
fscache_put_operation() when the processor function returns.  The processor
function is at liberty to call fscache_enqueue_operation() or to take extra
references.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:39 +01:00
David Howells
36c9559022 FS-Cache: Object management state machine
Implement the cache object management state machine.

The following documentation is added to illuminate the working of this state
machine.  It will also be added as:

	Documentation/filesystems/caching/object.txt

	     ====================================================
	     IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
	     ====================================================

==============
REPRESENTATION
==============

FS-Cache maintains an in-kernel representation of each object that a netfs is
currently interested in.  Such objects are represented by the fscache_cookie
struct and are referred to as cookies.

FS-Cache also maintains a separate in-kernel representation of the objects that
a cache backend is currently actively caching.  Such objects are represented by
the fscache_object struct.  The cache backends allocate these upon request, and
are expected to embed them in their own representations.  These are referred to
as objects.

There is a 1:N relationship between cookies and objects.  A cookie may be
represented by multiple objects - an index may exist in more than one cache -
or even by no objects (it may not be cached).

Furthermore, both cookies and objects are hierarchical.  The two hierarchies
correspond, but the cookies tree is a superset of the union of the object trees
of multiple caches:

	    NETFS INDEX TREE               :      CACHE 1     :      CACHE 2
	                                   :                  :
	                                   :   +-----------+  :
	                          +----------->|  IObject  |  :
	      +-----------+       |        :   +-----------+  :
	      |  ICookie  |-------+        :         |        :
	      +-----------+       |        :         |        :   +-----------+
	            |             +------------------------------>|  IObject  |
	            |                      :         |        :   +-----------+
	            |                      :         V        :         |
	            |                      :   +-----------+  :         |
	            V             +----------->|  IObject  |  :         |
	      +-----------+       |        :   +-----------+  :         |
	      |  ICookie  |-------+        :         |        :         V
	      +-----------+       |        :         |        :   +-----------+
	            |             +------------------------------>|  IObject  |
	      +-----+-----+                :         |        :   +-----------+
	      |           |                :         |        :         |
	      V           |                :         V        :         |
	+-----------+     |                :   +-----------+  :         |
	|  ICookie  |------------------------->|  IObject  |  :         |
	+-----------+     |                :   +-----------+  :         |
	      |           V                :         |        :         V
	      |     +-----------+          :         |        :   +-----------+
	      |     |  ICookie  |-------------------------------->|  IObject  |
	      |     +-----------+          :         |        :   +-----------+
	      V           |                :         V        :         |
	+-----------+     |                :   +-----------+  :         |
	|  DCookie  |------------------------->|  DObject  |  :         |
	+-----------+     |                :   +-----------+  :         |
	                  |                :                  :         |
	          +-------+-------+        :                  :         |
	          |               |        :                  :         |
	          V               V        :                  :         V
	    +-----------+   +-----------+  :                  :   +-----------+
	    |  DCookie  |   |  DCookie  |------------------------>|  DObject  |
	    +-----------+   +-----------+  :                  :   +-----------+
	                                   :                  :

In the above illustration, ICookie and IObject represent indices and DCookie
and DObject represent data storage objects.  Indices may have representation in
multiple caches, but currently, non-index objects may not.  Objects of any type
may also be entirely unrepresented.

As far as the netfs API goes, the netfs is only actually permitted to see
pointers to the cookies.  The cookies themselves and any objects attached to
those cookies are hidden from it.

===============================
OBJECT MANAGEMENT STATE MACHINE
===============================

Within FS-Cache, each active object is managed by its own individual state
machine.  The state for an object is kept in the fscache_object struct, in
object->state.  A cookie may point to a set of objects that are in different
states.

Each state has an action associated with it that is invoked when the machine
wakes up in that state.  There are four logical sets of states:

 (1) Preparation: states that wait for the parent objects to become ready.  The
     representations are hierarchical, and it is expected that an object must
     be created or accessed with respect to its parent object.

 (2) Initialisation: states that perform lookups in the cache and validate
     what's found and that create on disk any missing metadata.

 (3) Normal running: states that allow netfs operations on objects to proceed
     and that update the state of objects.

 (4) Termination: states that detach objects from their netfs cookies, that
     delete objects from disk, that handle disk and system errors and that free
     up in-memory resources.

In most cases, transitioning between states is in response to signalled events.
When a state has finished processing, it will usually set the mask of events in
which it is interested (object->event_mask) and relinquish the worker thread.
Then when an event is raised (by calling fscache_raise_event()), if the event
is not masked, the object will be queued for processing (by calling
fscache_enqueue_object()).

PROVISION OF CPU TIME
---------------------

The work to be done by the various states is given CPU time by the threads of
the slow work facility (see Documentation/slow-work.txt).  This is used in
preference to the workqueue facility because:

 (1) Threads may be completely occupied for very long periods of time by a
     particular work item.  These state actions may be doing sequences of
     synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
     getxattr, truncate, unlink, rmdir, rename).

 (2) Threads may do little actual work, but may rather spend a lot of time
     sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded
     workqueues don't necessarily have the right numbers of threads.

LOCKING SIMPLIFICATION
----------------------

Because only one worker thread may be operating on any particular object's
state machine at once, this simplifies the locking, particularly with respect
to disconnecting the netfs's representation of a cache object (fscache_cookie)
from the cache backend's representation (fscache_object) - which may be
requested from either end.

=================
THE SET OF STATES
=================

The object state machine has a set of states that it can be in.  There are
preparation states in which the object sets itself up and waits for its parent
object to transit to a state that allows access to its children:

 (1) State FSCACHE_OBJECT_INIT.

     Initialise the object and wait for the parent object to become active.  In
     the cache, it is expected that it will not be possible to look an object
     up from the parent object, until that parent object itself has been looked
     up.

There are initialisation states in which the object sets itself up and accesses
disk for the object metadata:

 (2) State FSCACHE_OBJECT_LOOKING_UP.

     Look up the object on disk, using the parent as a starting point.
     FS-Cache expects the cache backend to probe the cache to see whether this
     object is represented there, and if it is, to see if it's valid (coherency
     management).

     The cache should call fscache_object_lookup_negative() to indicate lookup
     failure for whatever reason, and should call fscache_obtained_object() to
     indicate success.

     At the completion of lookup, FS-Cache will let the netfs go ahead with
     read operations, no matter whether the file is yet cached.  If not yet
     cached, read operations will be immediately rejected with ENODATA until
     the first known page is uncached - as to that point there can be no data
     to be read out of the cache for that file that isn't currently also held
     in the pagecache.

 (3) State FSCACHE_OBJECT_CREATING.

     Create an object on disk, using the parent as a starting point.  This
     happens if the lookup failed to find the object, or if the object's
     coherency data indicated what's on disk is out of date.  In this state,
     FS-Cache expects the cache to create

     The cache should call fscache_obtained_object() if creation completes
     successfully, fscache_object_lookup_negative() otherwise.

     At the completion of creation, FS-Cache will start processing write
     operations the netfs has queued for an object.  If creation failed, the
     write ops will be transparently discarded, and nothing recorded in the
     cache.

There are some normal running states in which the object spends its time
servicing netfs requests:

 (4) State FSCACHE_OBJECT_AVAILABLE.

     A transient state in which pending operations are started, child objects
     are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
     lookup data is freed.

 (5) State FSCACHE_OBJECT_ACTIVE.

     The normal running state.  In this state, requests the netfs makes will be
     passed on to the cache.

 (6) State FSCACHE_OBJECT_UPDATING.

     The state machine comes here to update the object in the cache from the
     netfs's records.  This involves updating the auxiliary data that is used
     to maintain coherency.

And there are terminal states in which an object cleans itself up, deallocates
memory and potentially deletes stuff from disk:

 (7) State FSCACHE_OBJECT_LC_DYING.

     The object comes here if it is dying because of a lookup or creation
     error.  This would be due to a disk error or system error of some sort.
     Temporary data is cleaned up, and the parent is released.

 (8) State FSCACHE_OBJECT_DYING.

     The object comes here if it is dying due to an error, because its parent
     cookie has been relinquished by the netfs or because the cache is being
     withdrawn.

     Any child objects waiting on this one are given CPU time so that they too
     can destroy themselves.  This object waits for all its children to go away
     before advancing to the next state.

 (9) State FSCACHE_OBJECT_ABORT_INIT.

     The object comes to this state if it was waiting on its parent in
     FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself
     so that the parent may proceed from the FSCACHE_OBJECT_DYING state.

(10) State FSCACHE_OBJECT_RELEASING.
(11) State FSCACHE_OBJECT_RECYCLING.

     The object comes to one of these two states when dying once it is rid of
     all its children, if it is dying because the netfs relinquished its
     cookie.  In the first state, the cached data is expected to persist, and
     in the second it will be deleted.

(12) State FSCACHE_OBJECT_WITHDRAWING.

     The object transits to this state if the cache decides it wants to
     withdraw the object from service, perhaps to make space, but also due to
     error or just because the whole cache is being withdrawn.

(13) State FSCACHE_OBJECT_DEAD.

     The object transits to this state when the in-memory object record is
     ready to be deleted.  The object processor shouldn't ever see an object in
     this state.

THE SET OF EVENTS
-----------------

There are a number of events that can be raised to an object state machine:

 (*) FSCACHE_OBJECT_EV_UPDATE

     The netfs requested that an object be updated.  The state machine will ask
     the cache backend to update the object, and the cache backend will ask the
     netfs for details of the change through its cookie definition ops.

 (*) FSCACHE_OBJECT_EV_CLEARED

     This is signalled in two circumstances:

     (a) when an object's last child object is dropped and

     (b) when the last operation outstanding on an object is completed.

     This is used to proceed from the dying state.

 (*) FSCACHE_OBJECT_EV_ERROR

     This is signalled when an I/O error occurs during the processing of some
     object.

 (*) FSCACHE_OBJECT_EV_RELEASE
 (*) FSCACHE_OBJECT_EV_RETIRE

     These are signalled when the netfs relinquishes a cookie it was using.
     The event selected depends on whether the netfs asks for the backing
     object to be retired (deleted) or retained.

 (*) FSCACHE_OBJECT_EV_WITHDRAW

     This is signalled when the cache backend wants to withdraw an object.
     This means that the object will have to be detached from the netfs's
     cookie.

Because the withdrawing releasing/retiring events are all handled by the object
state machine, it doesn't matter if there's a collision with both ends trying
to sever the connection at the same time.  The state machine can just pick
which one it wants to honour, and that effects the other.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:38 +01:00
David Howells
7394daa8c6 FS-Cache: Add use of /proc and presentation of statistics
Make FS-Cache create its /proc interface and present various statistical
information through it.  Also provide the functions for updating this
information.

These features are enabled by:

	CONFIG_FSCACHE_PROC
	CONFIG_FSCACHE_STATS
	CONFIG_FSCACHE_HISTOGRAM

The /proc directory for FS-Cache is also exported so that caching modules can
add their own statistics there too.

The FS-Cache module is loadable at this point, and the statistics files can be
examined by userspace:

	cat /proc/fs/fscache/stats
	cat /proc/fs/fscache/histogram

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:37 +01:00
David Howells
0dfc41d1ef FS-Cache: Add the FS-Cache cache backend API and documentation
Add the API for a generic facility (FS-Cache) by which caches may declare them
selves open for business, and may obtain work to be done from network
filesystems.  The header file is included by:

	#include <linux/fscache-cache.h>

Documentation for the API is also added to:

	Documentation/filesystems/caching/backend-api.txt

This API is not usable without the implementation of the utility functions
which will be added in further patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
David Howells
2d6fff6370 FS-Cache: Add the FS-Cache netfs API and documentation
Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS
or NFS) may call on local caching capabilities without having to know anything
about how the cache works, or even if there is a cache:

	+---------+
	|         |                        +--------------+
	|   NFS   |--+                     |              |
	|         |  |                 +-->|   CacheFS    |
	+---------+  |   +----------+  |   |  /dev/hda5   |
	             |   |          |  |   +--------------+
	+---------+  +-->|          |  |
	|         |      |          |--+
	|   AFS   |----->| FS-Cache |
	|         |      |          |--+
	+---------+  +-->|          |  |
	             |   |          |  |   +--------------+
	+---------+  |   +----------+  |   |              |
	|         |  |                 +-->|  CacheFiles  |
	|  ISOFS  |--+                     |  /var/cache  |
	|         |                        +--------------+
	+---------+

General documentation and documentation of the netfs specific API are provided
in addition to the header files.

As this patch stands, it is possible to build a filesystem against the facility
and attempt to use it.  All that will happen is that all requests will be
immediately denied as if no cache is present.

Further patches will implement the core of the facility.  The facility will
transfer requests from networking filesystems to appropriate caches if
possible, or else gracefully deny them.

If this facility is disabled in the kernel configuration, then all its
operations will trivially reduce to nothing during compilation.

WHY NOT I_MAPPING?
==================

I have added my own API to implement caching rather than using i_mapping to do
this for a number of reasons.  These have been discussed a lot on the LKML and
CacheFS mailing lists, but to summarise the basics:

 (1) Most filesystems don't do hole reportage.  Holes in files are treated as
     blocks of zeros and can't be distinguished otherwise, making it difficult
     to distinguish blocks that have been read from the network and cached from
     those that haven't.

 (2) The backing inode must be fully populated before being exposed to
     userspace through the main inode because the VM/VFS goes directly to the
     backing inode and does not interrogate the front inode's VM ops.

     Therefore:

     (a) The backing inode must fit entirely within the cache.

     (b) All backed files currently open must fit entirely within the cache at
     	 the same time.

     (c) A working set of files in total larger than the cache may not be
     	 cached.

     (d) A file may not grow larger than the available space in the cache.

     (e) A file that's open and cached, and remotely grows larger than the
     	 cache is potentially stuffed.

 (3) Writes go to the backing filesystem, and can only be transferred to the
     network when the file is closed.

 (4) There's no record of what changes have been made, so the whole file must
     be written back.

 (5) The pages belong to the backing filesystem, and all metadata associated
     with that page are relevant only to the backing filesystem, and not
     anything stacked atop it.

OVERVIEW
========

FS-Cache provides (or will provide) the following facilities:

 (1) Caches can be added / removed at any time, even whilst in use.

 (2) Adds a facility by which tags can be used to refer to caches, even if
     they're not available yet.

 (3) More than one cache can be used at once.  Caches can be selected
     explicitly by use of tags.

 (4) The netfs is provided with an interface that allows either party to
     withdraw caching facilities from a file (required for (1)).

 (5) A netfs may annotate cache objects that belongs to it.  This permits the
     storage of coherency maintenance data.

 (6) Cache objects will be pinnable and space reservations will be possible.

 (7) The interface to the netfs returns as few errors as possible, preferring
     rather to let the netfs remain oblivious.

 (8) Cookies are used to represent indices, files and other objects to the
     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
     cached there.

 (9) The netfs is allowed to propose - dynamically - any index hierarchy it
     desires, though it must be aware that the index search function is
     recursive, stack space is limited, and indices can only be children of
     indices.

(10) Indices can be used to group files together to reduce key size and to make
     group invalidation easier.  The use of indices may make lookup quicker,
     but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages.  The
     netfs indicates that page A is at index B of the data-file represented by
     cookie C, and that it should be read or written.  The cache backend may or
     may not start I/O on that page, but if it does, a netfs callback will be
     invoked to indicate completion.  The I/O may be either synchronous or
     asynchronous.

(12) Cookies can be "retired" upon release.  At this point FS-Cache will mark
     them as obsolete and the index hierarchy rooted at that point will get
     recycled.

(13) The netfs provides a "match" function for index searches.  In addition to
     saying whether a match was made or not, this can also specify that an
     entry should be updated or deleted.

FS-Cache maintains a virtual index tree in which all indices, files, objects
and pages are kept.  Bits of this tree may actually reside in one or more
caches.

                                           FSDEF
                                             |
                        +------------------------------------+
                        |                                    |
                       NFS                                  AFS
                        |                                    |
           +--------------------------+                +-----------+
           |                          |                |           |
        homedir                     mirror          afs.org   redhat.com
           |                          |                            |
     +------------+           +---------------+              +----------+
     |            |           |               |              |          |
   00001        00002       00007           00125        vol00001   vol00002
     |            |           |               |                         |
 +---+---+     +-----+      +---+      +------+------+            +-----+----+
 |   |   |     |     |      |   |      |      |      |            |     |    |
PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
                     |                                            |
                    PG0                                       +-------+
                                                              |       |
                                                            00001   00003
                                                              |
                                                          +---+---+
                                                          |   |   |
                                                         PG0 PG1 PG2

In the example above, two netfs's can be seen to be backed: NFS and AFS.  These
have different index hierarchies:

 (*) The NFS primary index will probably contain per-server indices.  Each
     server index is indexed by NFS file handles to get data file objects.
     Each data file objects can have an array of pages, but may also have
     further child objects, such as extended attributes and directory entries.
     Extended attribute objects themselves have page-array contents.

 (*) The AFS primary index contains per-cell indices.  Each cell index contains
     per-logical-volume indices.  Each of volume index contains up to three
     indices for the read-write, read-only and backup mirrors of those volumes.
     Each of these contains vnode data file objects, each of which contains an
     array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children.  Any index with non-index object children will be assumed to only
reside in one cache.

The FS-Cache overview can be found in:

	Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

	Documentation/filesystems/caching/netfs-api.txt

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:36 +01:00
Shen Feng
760df93ecd documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls
Now /proc/sys is described in many places and much information is
redundant.  This patch updates the proc.txt and move the /proc/sys
desciption out to the files in Documentation/sysctls.

Details are:

merge
-  2.1  /proc/sys/fs - File system data
-  2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
-  2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface
with Documentation/sysctls/fs.txt.

remove
-  2.2  /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
since it's not better then the Documentation/binfmt_misc.txt.

merge
-  2.3  /proc/sys/kernel - general kernel parameters
with Documentation/sysctls/kernel.txt

remove
-  2.5  /proc/sys/dev - Device specific parameters
since it's obsolete the sysfs is used now.

remove
-  2.6  /proc/sys/sunrpc - Remote procedure calls
since it's not better then the Documentation/sysctls/sunrpc.txt

move
-  2.7  /proc/sys/net - Networking stuff
-  2.9  Appletalk
-  2.10 IPX
to newly created Documentation/sysctls/net.txt.

remove
-  2.8  /proc/sys/net/ipv4 - IPV4 settings
since it's not better then the Documentation/networking/ip-sysctl.txt.

add
- Chapter 3 Per-Process Parameters
to descibe /proc/<pid>/xxx parameters.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:04:53 -07:00
Marcin Slusarz
7ac9bcd5da udf: implement mode and dmode mounting options
"dmode" allows overriding permissions of directories and
"mode" allows overriding permissions of files.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2009-04-02 12:29:50 +02:00
Linus Torvalds
395d73413c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits)
  ext4: Regularize mount options
  ext4: fix locking typo in mballoc which could cause soft lockup hangs
  ext4: fix typo which causes a memory leak on error path
  jbd2: Update locking coments
  ext4: Rename pa_linear to pa_type
  ext4: add checks of block references for non-extent inodes
  ext4: Check for an valid i_mode when reading the inode from disk
  ext4: Use WRITE_SYNC for commits which are caused by fsync()
  ext4: Add auto_da_alloc mount option
  ext4: Use struct flex_groups to calculate get_orlov_stats()
  ext4: Use atomic_t's in struct flex_groups
  ext4: remove /proc tuning knobs
  ext4: Add sysfs support
  ext4: Track lifetime disk writes
  ext4: Fix discard of inode prealloc space with delayed allocation.
  ext4: Automatically allocate delay allocated blocks on rename
  ext4: Automatically allocate delay allocated blocks on close
  ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl
  ext4: Simplify delalloc code by removing mpage_da_writepages()
  ext4: Save stack space by removing fake buffer heads
  ...
2009-04-01 10:57:49 -07:00
Linus Torvalds
e76e5b2c66 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
  PCI: fix HT MSI mapping fix
  PCI: don't enable too much HT MSI mapping
  x86/PCI: make pci=lastbus=255 work when acpi is on
  PCI: save and restore PCIe 2.0 registers
  PCI: update fakephp for bus_id removal
  PCI: fix kernel oops on bridge removal
  PCI: fix conflict between SR-IOV and config space sizing
  powerpc/PCI: include pci.h in powerpc MSI implementation
  PCI Hotplug: schedule fakephp for feature removal
  PCI Hotplug: rename legacy_fakephp to fakephp
  PCI Hotplug: restore fakephp interface with complete reimplementation
  PCI: Introduce /sys/bus/pci/devices/.../rescan
  PCI: Introduce /sys/bus/pci/devices/.../remove
  PCI: Introduce /sys/bus/pci/rescan
  PCI: Introduce pci_rescan_bus()
  PCI: do not enable bridges more than once
  PCI: do not initialize bridges more than once
  PCI: always scan child buses
  PCI: pci_scan_slot() returns newly found devices
  PCI: don't scan existing devices
  ...

Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
2009-04-01 09:47:12 -07:00
Nick Piggin
c2ec175c39 mm: page_mkwrite change prototype to match fault
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01 08:59:14 -07:00
Boaz Harrosh
214c8adb87 exofs: Documentation
Added some documentation in exofs.txt, as well as a BUGS file.

For further reading, operation instructions, example scripts
and up to date infomation and code please see:
http://open-osd.org

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-03-31 19:44:38 +03:00
Pavel Machek
e3375ac767 trivial: document ext3 semantics of 'ro' option a bit better
ext3 has quite unexpected semantics or "ro" and defaults are
not what they are documented to be, due to mkfs override.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-03-30 15:21:56 +02:00
Theodore Ts'o
06705bff91 ext4: Regularize mount options
Add support for using the mount options "barrier" and "nobarrier", and
"auto_da_alloc" and "noauto_da_alloc", which is more consistent than
"barrier=<0|1>" or "auto_da_alloc=<0|1>".  Most other ext3/ext4 mount
options use the foo/nofoo naming convention.  We allow the old forms
of these mount options for backwards compatibility.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-28 10:59:57 -04:00
Greg Banks
b5cbc369db Document /proc/fs/nfsd/pool_stats
Document the format and semantics of the /proc/fs/nfsd/pool_stats file.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-27 19:24:27 -04:00
Linus Torvalds
8e9d208972 Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6
* 'bkl-removal' of git://git.lwn.net/linux-2.6:
  Rationalize fasync return values
  Move FASYNC bit handling to f_op->fasync()
  Use f_lock to protect f_flags
  Rename struct file->f_ep_lock
2009-03-26 16:14:02 -07:00
Jody McIntyre
1db4b2d221 trivial: fix orphan dates in ext2 documentation
Revert the change to the orphan dates of Windows 95, DOS, compression.
Add a new orphan date for OS/2.

Signed-off-by: Jody McIntyre <scjody@sun.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-23 14:21:26 -07:00
Linus Torvalds
d56ffd38a9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
  ucc_geth: Fix oops when using fixed-link support
  dm9000: locking bugfix
  net: update dnet.c for bus_id removal
  dnet: DNET should depend on HAS_IOMEM
  dca: add missing copyright/license headers
  nl80211: Check that function pointer != NULL before using it
  sungem: missing net_device_ops
  be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle
  be2net: replenish when posting to rx-queue is starved in out of mem conditions
  bas_gigaset: correctly allocate USB interrupt transfer buffer
  smsc911x: reset last known duplex and carrier on open
  sh_eth: Fix mistake of the address of SH7763
  sh_eth: Change handling of IRQ
  netns: oops in ip[6]_frag_reasm incrementing stats
  net: kfree(napi->skb) => kfree_skb
  net: fix sctp breakage
  ipv6: fix display of local and remote sit endpoints
  net: Document /proc/sys/net/core/netdev_budget
  tulip: fix crash on iface up with shirq debug
  virtio_net: Make virtio_net support carrier detection
  ...
2009-03-23 09:25:58 -07:00
Alex Chiang
77c27c7b49 PCI: Introduce /sys/bus/pci/devices/.../remove
This patch adds an attribute named "remove" to a PCI device's sysfs
directory.  Writing a non-zero value to this attribute will remove the PCI
device and any children of it.

Trent Piepho wrote the original implementation and documentation.

Thanks to Vegard Nossum for testing under kmemcheck and finding locking
issues with the sysfs interface.

Cc: Trent Piepho <xyzzy@speakeasy.org>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-20 14:58:48 -07:00
Stanislaw Gruszka
e9c6a586f5 net: Document /proc/sys/net/core/netdev_budget
The NAPI poll parameter netdev_budget is not documented in
kernel-docs. Since it may have a substantial effect on at least some
network loads, it should be.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> 
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:51:06 -07:00
Jonathan Corbet
76398425bb Move FASYNC bit handling to f_op->fasync()
Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.  As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-03-16 08:32:27 -06:00
Jody McIntyre
ab03eca8d4 trivial: fix bad links in the ext2 and ext3 documentation
Trivial patch to fix bad links in the ext2 and ext3 documentation.

Signed-off-by: Jody McIntyre <scjody@sun.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-12 16:24:25 -07:00
Phillip Lougher
edf2e2811e Squashfs: fix documentation typo, Cramfs filesystem limit is 256 MiB
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2009-03-05 00:40:13 +00:00
Theodore Ts'o
b713a5ec55 ext4: remove /proc tuning knobs
Remove tuning knobs in /proc/fs/ext4/<dev/* since they have been
replaced by knobs in sysfs at /sys/fs/ext4/<dev>/*.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-03-31 09:11:14 -04:00
Theodore Ts'o
722bde6875 ext4: Add fine print for the 32000 subdirectory limit
Some poeple are reading the ext4 feature list too literally and create
dubious test cases involving very long filenames and 1k blocksize and
then complain when they run into an htree-imposed limit.  So add fine
print to the "fix 32000 subdirectory limit" ext4 feature.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-02-23 00:51:57 -05:00
Mike Murphy
f8a1af6bbc PATCH [2/2] Documentation/filesystems/sysfs.txt: fix descriptions of device attributes
Fix descriptions of device attributes to be consistent with the actual
implementations in include/linux/device.h

Signed-off-by: Mike Murphy <mamurph[at]cs.clemson.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 09:28:15 -08:00
Timothy S. Nelson
97c44836cd PCI: return error on failure to read PCI ROMs
This patch makes the ROM reading code return an error to user space if
the size of the ROM read is equal to 0.

The patch also emits a warnings if the contents of the ROM are invalid,
and documents the effects of the "enable" file on ROM reading.

Signed-off-by: Timothy S. Nelson <wayland@wayland.id.au>
Acked-by: Alex Villacis-Lasso <a_villacis@palosanto.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-02-04 16:58:41 -08:00
Linus Torvalds
f96c08e8c5 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: remove fast unmounting
  UBIFS: return sensible error codes
  UBIFS: remount ro fixes
  UBIFS: spelling fix 'date' -> 'data'
  UBIFS: sync wbufs after syncing inodes and pages
  UBIFS: fix LPT out-of-space bug (again)
  UBIFS: fix no_chk_data_crc
  UBIFS: fix assertions
  UBIFS: ensure orphan area head is initialized
  UBIFS: always clean up GC LEB space
  UBIFS: add re-mount debugging checks
  UBIFS: fix LEB list freeing
  UBIFS: simplify locking
  UBIFS: document dark_wm and dead_wm better
  UBIFS: do not treat all data as short term
  UBIFS: constify operations
  UBIFS: do not commit twice
2009-02-03 16:52:44 -08:00
Evgeniy Polyakov
9e9e3cbc62 mm: OOM documentation update
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-29 18:04:43 -08:00
Artem Bityutskiy
27ad279933 UBIFS: remove fast unmounting
This UBIFS feature has never worked properly, and it was a mistake
to add it because we simply have no use-cases. So, lets still accept
the fast_unmount mount option, but ignore it. This does not change
much, because UBIFS commit in sync_fs anyway, and sync_fs is called
while unmounting.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-01-29 16:34:30 +02:00
James Lentini
096abd7703 update port number in NFS/RDMA documentation
Update the NFS/RDMA documentation to use the new port number assigned
by IANA.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-27 17:20:14 -05:00
Peter W Morreale
db0fb1848a Update of Documentation: vm.txt and proc.txt
Update Documentation/sysctl/vm.txt and Documentation/filesystems/proc.txt.
 More specifically, the section on /proc/sys/vm in
Documentation/filesystems/proc.txt was removed and a link to
Documentation/sysctl/vm.txt added.

Most of the verbiage from proc.txt was simply moved in vm.txt, with new
addtional text for "swappiness" and "stat_interval".

Signed-off-by: Peter W Morreale <pmorreale@novell.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-15 16:39:35 -08:00
Takashi Sato
c4be0c1dc4 filesystem freeze: add error handling of write_super_lockfs/unlockfs
Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.

Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-09 16:54:42 -08:00
Linus Torvalds
31aeb6c815 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  MAINTAINERS: squashfs entry
  Squashfs: documentation
  Squashfs: initrd support
  Squashfs: Kconfig entry
  Squashfs: Makefiles
  Squashfs: header files
  Squashfs: block operations
  Squashfs: cache operations
  Squashfs: uid/gid lookup operations
  Squashfs: fragment block operations
  Squashfs: export operations
  Squashfs: super block operations
  Squashfs: symlink operations
  Squashfs: regular file operations
  Squashfs: directory readdir operations
  Squashfs: directory lookup operations
  Squashfs: inode operations
2009-01-09 15:18:49 -08:00
Linus Torvalds
73d59314e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (864 commits)
  Btrfs: explicitly mark the tree log root for writeback
  Btrfs: Drop the hardware crc32c asm code
  Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
  Btrfs: kmap_atomic(KM_USER0) is safe for btrfs_readpage_end_io_hook
  Btrfs: Don't use kmap_atomic(..., KM_IRQ0) during checksum verifies
  Btrfs: tree logging checksum fixes
  Btrfs: don't change file extent's ram_bytes in btrfs_drop_extents
  Btrfs: Use btrfs_join_transaction to avoid deadlocks during snapshot creation
  Btrfs: drop remaining LINUX_KERNEL_VERSION checks and compat code
  Btrfs: drop EXPORT symbols from extent_io.c
  Btrfs: Fix checkpatch.pl warnings
  Btrfs: Fix free block discard calls down to the block layer
  Btrfs: avoid orphan inode caused by log replay
  Btrfs: avoid potential super block corruption
  Btrfs: do not call kfree if kmalloc failed in btrfs_sysfs_add_super
  Btrfs: fix a memory leak in btrfs_get_sb
  Btrfs: Fix typo in clear_state_cb
  Btrfs: Fix memset length in btrfs_file_write
  Btrfs: update directory's size when creating subvol/snapshot
  Btrfs: add permission checks to the ioctls
  ...
2009-01-09 13:01:38 -08:00
Linus Torvalds
2150edc6c5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
  jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
  ext4: Remove "extents" mount option
  block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
  ext4: Make printk's consistently prefixed with "EXT4-fs: "
  ext4: Add sanity checks for the superblock before mounting the filesystem
  ext4: Add mount option to set kjournald's I/O priority
  jbd2: Submit writes to the journal using WRITE_SYNC
  jbd2: Add pid and journal device name to the "kjournald2 starting" message
  ext4: Add markers for better debuggability
  ext4: Remove code to create the journal inode
  ext4: provide function to release metadata pages under memory pressure
  ext3: provide function to release metadata pages under memory pressure
  add releasepage hooks to block devices which can be used by file systems
  ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
  ext4: Init the complete page while building buddy cache
  ext4: Don't allow new groups to be added during block allocation
  ext4: mark the blocks/inode bitmap beyond end of group as used
  ext4: Use new buffer_head flag to check uninit group bitmaps initialization
  ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
  ext4: code cleanup
  ...
2009-01-08 17:14:59 -08:00
Linus Torvalds
a0c9f240a9 Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
  proc: remove write-only variable in proc_pident_lookup()
  proc: fix sparse warning
  proc: add /proc/*/stack
  proc: remove '##' usage
  proc: remove useless WARN_ONs
  proc: stop using BKL
2009-01-07 12:01:06 -08:00
David Woodhouse
709ac06a14 Btrfs: Add Documentation/filesystem/btrfs.txt, remove old COPYING
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-01-07 09:54:24 -05:00
Tejun Heo
5f820f648c poll: allow f_op->poll to sleep
f_op->poll is the only vfs operation which is not allowed to sleep.  It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions.  The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.

This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events.  The
only added overhead is an extra function call during wake up and
negligible.

This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.

While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.

* s/type * symbol/type *symbol/		   : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c

Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Brad Boyer <flar@allandria.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:12 -08:00
David Rientjes
2da02997e0 mm: add dirty_background_bytes and dirty_bytes sysctls
This change introduces two new sysctls to /proc/sys/vm:
dirty_background_bytes and dirty_bytes.

dirty_background_bytes is the counterpart to dirty_background_ratio and
dirty_bytes is the counterpart to dirty_ratio.

With growing memory capacities of individual machines, it's no longer
sufficient to specify dirty thresholds as a percentage of the amount of
dirtyable memory over the entire system.

dirty_background_bytes and dirty_bytes specify quantities of memory, in
bytes, that represent the dirty limits for the entire system.  If either
of these values is set, its value represents the amount of dirty memory
that is needed to commence either background or direct writeback.

When a `bytes' or `ratio' file is written, its counterpart becomes a
function of the written value.  For example, if dirty_bytes is written to
be 8096, 8K of memory is required to commence direct writeback.
dirty_ratio is then functionally equivalent to 8K / the amount of
dirtyable memory:

	dirtyable_memory = free pages + mapped pages + file cache

	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
		-or-
	dirty_background_ratio = dirty_background_bytes / dirtyable_memory

		AND

	dirty_bytes = dirty_ratio * dirtyable_memory
		-or-
	dirty_ratio = dirty_bytes / dirtyable_memory

Only one of dirty_background_bytes and dirty_background_ratio may be
specified at a time, and only one of dirty_bytes and dirty_ratio may be
specified.  When one sysctl is written, the other appears as 0 when read.

The `bytes' files operate on a page size granularity since dirty limits
are compared with ZVC values, which are in page units.

Prior to this change, the minimum dirty_ratio was 5 as implemented by
get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
written value between 0 and 100.  This restriction is maintained, but
dirty_bytes has a lower limit of only one page.

Also prior to this change, the dirty_background_ratio could not equal or
exceed dirty_ratio.  This restriction is maintained in addition to
restricting dirty_background_bytes.  If either background threshold equals
or exceeds that of the dirty threshold, it is implicitly set to half the
dirty threshold.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:03 -08:00
Theodore Ts'o
83982b6f47 ext4: Remove "extents" mount option
This mount option is largely superfluous, and in fact the way it was
implemented was buggy; if a filesystem which did not have the extents
feature flag was mounted -o extents, the filesystem would attempt to
create and use extents-based file even though the extents feature flag
was not eabled.  The simplest thing to do is to nuke the mount option
entirely.  It's not all that useful to force the non-creation of new
extent-based files if the filesystem can support it.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06 14:53:16 -05:00
Theodore Ts'o
b3881f74b3 ext4: Add mount option to set kjournald's I/O priority
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
2009-01-05 22:46:26 -05:00
Tiger Yang
a68979b857 ocfs2: add mount option and Kconfig option for acl
This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
and mount options "acl" to enable acls in Ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05 08:36:52 -08:00
Ken Chen
2ec220e27f proc: add /proc/*/stack
/proc/*/stack adds the ability to query a task's stack trace. It is more
useful than /proc/*/wchan as it provides full stack trace instead of single
depth. Example output:

	$ cat /proc/self/stack
	[<c010a271>] save_stack_trace_tsk+0x17/0x35
	[<c01827b4>] proc_pid_stack+0x4a/0x76
	[<c018312d>] proc_single_show+0x4a/0x5e
	[<c016bdec>] seq_read+0xf3/0x29f
	[<c015a004>] vfs_read+0x6d/0x91
	[<c015a0c1>] sys_read+0x3b/0x60
	[<c0102eda>] syscall_call+0x7/0xb
	[<ffffffff>] 0xffffffff

[add save_stack_trace_tsk() on mips, ACK Ralf --adobriyan]
Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-05 12:27:44 +03:00
Phillip Lougher
9eb425c046 Squashfs: documentation
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
2009-01-05 08:46:29 +00:00
Theodore Ts'o
c319106723 ext4: Remove code to create the journal inode
This code has been obsolete in quite some time, since the supported
method for adding a journal inode is to use tune2fs (or to creating
new filesystem with a journal via mke2fs or mkfs.ext4).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06 11:14:25 -05:00
Linus Torvalds
8e3bda0863 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (33 commits)
  UBIFS: add more useful debugging prints
  UBIFS: print debugging messages properly
  UBIFS: fix numerous spelling mistakes
  UBIFS: allow mounting when short of space
  UBIFS: fix writing uncompressed files
  UBIFS: fix checkpatch.pl warnings
  UBIFS: fix sparse warnings
  UBIFS: simplify make_free_space
  UBIFS: do not lie about used blocks
  UBIFS: restore budg_uncommitted_idx
  UBIFS: always commit on unmount
  UBIFS: use ubi_sync
  UBIFS: always commit in sync_fs
  UBIFS: fix file-system synchronization
  UBIFS: fix constants initialization
  UBIFS: avoid unnecessary calculations
  UBIFS: re-calculate min_idx_size after the commit
  UBIFS: use nicer 64-bit math
  UBIFS: fix available blocks count
  UBIFS: various comment improvements and fixes
  ...
2009-01-02 15:57:47 -08:00
Sukadev Bhattiprolu
784c4d8b1b Document usage of multiple-instances of devpts
Changelog [v2]:
	- Add note indicating strict isolation is not possible unless all
	  mounts of devpts use the 'newinstance' mount option.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-02 10:19:36 -08:00
Al Viro
6badd79bd0 kill ->dir_notify()
Remove the hopelessly misguided ->dir_notify().  The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:43 -05:00
Eric Dumazet
fd659fd627 fix f_count description in Documentation/filesystems/files.txt
Documentation/filesystems/files.txt was not updated when
f_count became an atomic_long_t.
atomic_long_inc_not_zero() is now used instead of atomic_inc_not_zero()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:42 -05:00
Zhaolei
be42c4c433 correct wrong function name of d_put in kernel document and source comment
no function named d_put(), it should be dput().

Impact: fix document and comment, no functionality changed

Signed-off-by: Zhao Lei <zhaolei@cn.fuijtsu.com>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:40 -05:00
Artem Bityutskiy
80736d41f8 UBIFS: fix numerous spelling mistakes
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-31 14:13:25 +02:00
Lachlan McIlroy
0a8c5395f9 [XFS] Fix merge failures
Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Conflicts:

	fs/xfs/linux-2.6/xfs_cred.h
	fs/xfs/linux-2.6/xfs_globals.h
	fs/xfs/linux-2.6/xfs_ioctl.c
	fs/xfs/xfs_vnodeops.h

Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-29 16:47:18 +11:00
Ingo Molnar
fa623d1b02 Merge branches 'x86/apic', 'x86/cleanups', 'x86/cpufeature', 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/detect-hyper', 'x86/doc', 'x86/dumpstack', 'x86/early-printk', 'x86/fpu', 'x86/idle', 'x86/io', 'x86/memory-corruption-check', 'x86/microcode', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/pat2', 'x86/pci-ioapic-boot-irq-quirks', 'x86/ptrace', 'x86/quirks', 'x86/reboot', 'x86/setup-memory', 'x86/signal', 'x86/sparse-fixes', 'x86/time', 'x86/uv' and 'x86/xen' into x86/core 2008-12-23 16:27:23 +01:00
Lachlan McIlroy
14d676f56f Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-05 15:27:43 +11:00
Artem Bityutskiy
553dea4dd5 UBIFS: introduce compression mount options
It is very handy to be able to change default UBIFS compressor
via mount options. Introduce -o compr=<name> mount option support.
Currently only "none", "lzo" and "zlib" compressors are supported.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-12-03 13:14:05 +02:00
Davide Libenzi
7ef9964e6d epoll: introduce resource usage limits
It has been thought that the per-user file descriptors limit would also
limit the resources that a normal user can request via the epoll
interface.  Vegard Nossum reported a very simple program (a modified
version attached) that can make a normal user to request a pretty large
amount of kernel memory, well within the its maximum number of fds.  To
solve such problem, default limits are now imposed, and /proc based
configuration has been introduced.  A new directory has been created,
named /proc/sys/fs/epoll/ and inside there, there are two configuration
points:

  max_user_instances = Maximum number of devices - per user

  max_user_watches   = Maximum number of "watched" fds - per user

The current default for "max_user_watches" limits the memory used by epoll
to store "watches", to 1/32 of the amount of the low RAM.  As example, a
256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
That should be enough to not break existing heavy epoll users.  The
default value for "max_user_instances" is set to 128, that should be
enough too.

This also changes the userspace, because a new error code can now come out
from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
listed, so that should be ok.

[akpm@linux-foundation.org: use get_current_user()]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <stable@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:24 -08:00
Mark Fasheh
a2eee69b81 ocfs2: Small documentation update
Remove some features from the "not-supported" list that are actually
supported now.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01 14:46:49 -08:00
frans
1838e39214 Trivial Documentation/filesystems/ramfs-rootfs-initramfs.txt fix
A very minor patch on ramfs-rootfs-initramfs.txt: update the location
where CONFIG_INITRAMFS_SOURCE lives in menuconfig

Signed-off-by: Frans Meulenbroeks <fransmeulenbroeks@gmail.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 11:40:56 -08:00
Lachlan McIlroy
b5a20aa265 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-11-28 15:23:52 +11:00
Marco Stornelli
084c304980 DOC: update xip method info
xip documentation updated:
- change "get_xip_page" to "get_xip_mem";
- explain changed function parameters

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12 17:17:17 -08:00
Niv Sardi
dcd7b4e5c0 Merge branch 'master' of git://oss.sgi.com:8090/xfs/linux-2.6 2008-11-07 15:07:12 +11:00
OGAWA Hirofumi
dfc209c006 fat: Fix ATTR_RO for directory
FAT has the ATTR_RO (read-only) attribute. But on Windows, the ATTR_RO
of the directory will be just ignored actually, and is used by only
applications as flag. E.g. it's setted for the customized folder by
Explorer.

http://msdn2.microsoft.com/en-us/library/aa969337.aspx

This adds "rodir" option. If user specified it, ATTR_RO is used as
read-only flag even if it's the directory. Otherwise, inode->i_mode
is not used to hold ATTR_RO (i.e. fat_mode_can_save_ro() returns 0).

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:21 -08:00
Bart Trojanowski
8986ab5963 fat: document additional vfat mount options
While debugging a sync mount regression on vfat I noticed that there were
mount options parsed by the driver that were not documented.

[hirofumi@mail.parknet.co.jp: fix some parts]
Signed-off-by: Bart Trojanowski <bart@jukie.net>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06 15:41:20 -08:00
Theodore Ts'o
30773840c1 ext4: add fsync batch tuning knobs
Add new mount options, min_batch_time and max_batch_time, which
controls how long the jbd2 layer should wait for additional filesystem
operations to get batched with a synchronous write transaction.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-03 20:27:38 -05:00
Theodore Ts'o
8e1a4857cd Update Documentation/filesystems/ext4.txt
Fix paragraph with recommendations on how to tune ext4 for benchmarks.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-01-06 14:53:06 -05:00
Nick Piggin
4e02ed4b4a fs: remove prepare_write/commit_write
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:45 -07:00
Aristeu Rozanski
8a1c8eb75b x86, nmi-watchdog: update procfs nmi_watchdog file documentation v2
Impact: improve documentation

This patch updates the /proc/sys/kernel/nmi_watchdog documentation.
Updated: included Randy Dunlap's corrections.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-30 19:07:04 +01:00
Tim Shimmin
f3f0d7b026 [XFS] remove restricted chown parameter from xfs linux
On Linux all filesystems are supposed to be operating under Posix'
restricted chown. Restricted chown means it restricts chown to the owner
unless you have CAP_FOWNER.

NOTE: that 2 files outside of fs/xfs have been modified too for this
change.

Reviewed-by: Dave Chinner <david@fromorbit.com>

SGI-PV: 988919

SGI-Modid: 2.6.x-xfs-melb:linux:32413b

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-10-30 18:30:09 +11:00
Linus Torvalds
396b122f6a Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits)
  UBIFS: fix ubifs_compress commentary
  UBIFS: amend printk
  UBIFS: do not read unnecessary bytes when unpacking bits
  UBIFS: check buffer length when scanning for LPT nodes
  UBIFS: correct condition to eliminate unecessary assignment
  UBIFS: add more debugging messages for LPT
  UBIFS: fix bulk-read handling uptodate pages
  UBIFS: improve garbage collection
  UBIFS: allow for sync_fs when read-only
  UBIFS: commit on sync_fs
  UBIFS: correct comment for commit_on_unmount
  UBIFS: update dbg_dump_inode
  UBIFS: fix commentary
  UBIFS: fix races in bit-fields
  UBIFS: ensure data read beyond i_size is zeroed out correctly
  UBIFS: correct key comparison
  UBIFS: use bit-fields when possible
  UBIFS: check data CRC when in error state
  UBIFS: improve znode splitting rules
  UBIFS: add no_chk_data_crc mount option
  ...
2008-10-20 09:19:03 -07:00
Hidehiro Kawai
0e4fb5e283 ext3: add an option to control error handling on file data
If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently.  Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error.  It's scary for mission critical systems.  On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable.  So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.

If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error.  If you mount it with data_err=ignore, it doesn't abort, just
call printk().  data_err=ignore is the default.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:37 -07:00
Andrea Righi
7a6560e025 documentation: clarify dirty_ratio and dirty_background_ratio description
The current documentation of dirty_ratio and dirty_background_ratio is a
bit misleading.

In the documentation we say that they are "a percentage of total system
memory", but the current page writeback policy, intead, is to apply the
percentages to the dirtyable memory, that means free pages + reclaimable
pages.

Better to be more explicit to clarify this concept.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
KOSAKI Motohiro
e575f111dc coredump_filter: add hugepage dumping
Presently hugepage's vma has a VM_RESERVED flag in order not to be
swapped.  But a VM_RESERVED vma isn't core dumped because this flag is
often used for some kernel vmas (e.g.  vmalloc, sound related).

Thus hugepages are never dumped and it can't be debugged easily.  Many
developers want hugepages to be included into core-dump.

However, We can't read generic VM_RESERVED area because this area is often
IO mapping area.  then these area reading may change device state.  it is
definitly undesiable side-effect.

So adding a hugepage specific bit to the coredump filter is better.  It
will be able to hugepage core dumping and doesn't cause any side-effect to
any i/o devices.

In additional, libhugetlb use hugetlb private mapping pages as anonymous
page.  Then, hugepage private mapping pages should be core dumped by
default.

Then, /proc/[pid]/core_dump_filter has two new bits.

 - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes)
 - bit 6 mean hugetlb shared mapping pages are dumped or not.  (default: no)

I tested by following method.

% ulimit -c unlimited
% ./crash_hugepage  50
% ./crash_hugepage  50  -p
% ls -lh
% gdb ./crash_hugepage core
%
% echo 0x43 > /proc/self/coredump_filter
% ./crash_hugepage  50
% ./crash_hugepage  50  -p
% ls -lh
% gdb ./crash_hugepage core

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>

#include "hugetlbfs.h"

int main(int argc, char** argv){
	char* p;
	int ch;
	int mmap_flags = MAP_SHARED;
	int fd;
	int nr_pages;

	while((ch = getopt(argc, argv, "p")) != -1) {
		switch (ch) {
		case 'p':
			mmap_flags &= ~MAP_SHARED;
			mmap_flags |= MAP_PRIVATE;
			break;
		default:
			/* nothing*/
			break;
		}
	}
	argc -= optind;
	argv += optind;

	if (argc == 0){
		printf("need # of pages\n");
		exit(1);
	}

	nr_pages = atoi(argv[0]);
	if (nr_pages < 2) {
		printf("nr_pages must >2\n");
		exit(1);
	}

	fd = hugetlbfs_unlinked_fd();
	p = mmap(NULL, nr_pages * gethugepagesize(),
		 PROT_READ|PROT_WRITE, mmap_flags, fd, 0);

	sleep(2);

	*(p + gethugepagesize()) = 1; /* COW */
	sleep(2);

	/* crash! */
	*(int*)0 = 1;

	return 0;
}

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: William Irwin <wli@holomorphy.com>
Cc: Adam Litke <agl@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:32 -07:00
Diego Calleja
22359f5745 ext4: Update Documentation/filesystems/ext4.txt
Since Ext4 is supposed to be stable in 2.6.28-rc, ext4's documentation
file should be updated.

[ More updates also added by Theodore Ts'o. ]

Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-17 09:15:14 -04:00
Ian Kent
4b22ff1341 autofs4: device node ioctl documentation
Add documentation for the miscellaneous device module of autofs4.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:39 -07:00
Bernhard Walle
22b8ab66de Document panic_on_unrecovered_nmi sysctl
This adds "panic_on_unrecovered_nmi" sysctl to
Documentation/filesystems/proc.txt. The text is mainly taken from
http://readlist.com/lists/vger.kernel.org/linux-kernel/43/217998.html.

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:32 -07:00
FD Cami
2223c65103 Remove Andrew Morton's http://www.zip.com.au/~akpm/
Remove Andrew Morton's http://www.zip.com.au/~akpm/ urls, update to new
ones when necessary, delete references otherwise.

There are still instances of that living in:
Documentation/zh_CN/HOWTO
Documentation/zh_CN/SubmittingPatches
Documentation/ko_KR/HOWTO
Documentation/ja_JP/SubmittingPatches

Signed-off-by: Francois Cami <francois.cami@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:32 -07:00
Shane McDonald
1c828320d2 doc: typo in Documentation/filesystems/nfsroot.txt
Add a missing word to the explanation of the purpose of the zdisk and
bzdisk make targets.

Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:31 -07:00
frans
dd1c53a64a Fix Documentation/filesystems/ramfs-rootfs-initramfs.txt
First a file hello.c is created, then the file hello2.c is compiled.
Change this to hello.c

Signed-off-by: Frans Meulenbroeks <fransmeulenbroeks@gmail.com>
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:30 -07:00
Mark Fasheh
696b55d768 ocfs2: Documentation update for user_xattr / nouser_xattr mount options
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 17:02:44 -07:00
Joel Becker
12462f1d9f ocfs2: Add the 'inode64' mount option.
Now that ocfs2 limits inode numbers to 32bits, add a mount option to
disable the limit.  This parallels XFS.  64bit systems can handle the
larger inode numbers.

[ Added description of inode64 mount option in ocfs2.txt. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-13 16:57:08 -07:00
Linus Torvalds
20272c8994 Merge branch 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc
* 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
  proc: remove kernel.maps_protect
  proc: remove now unneeded ADDBUF macro
  [PATCH] proc: show personality via /proc/pid/personality
  [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
  proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig
  proc: make grab_header() static
  proc: remove unused get_dma_list()
  proc: remove dummy vmcore_open()
  proc: proc_sys_root tweak
  proc: fix return value of proc_reg_open() in "too late" case

Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
2008-10-13 10:04:04 -07:00
Hidehiro Kawai
5bf5683a33 ext4: add an option to control error handling on file data
If the journal doesn't abort when it gets an IO error in file data
blocks, the file data corruption will spread silently.  Because
most of applications and commands do buffered writes without fsync(),
they don't notice the IO error.  It's scary for mission critical
systems.  On the other hand, if the journal aborts whenever it gets
an IO error in file data blocks, the system will easily become
inoperable.  So this patch introduces a filesystem option to
determine whether it aborts the journal or just call printk() when
it gets an IO error in file data.

If you mount an ext4 fs with data_err=abort option, it aborts on file
data write error.  If you mount it with data_err=ignore, it doesn't
abort, just call printk().  data_err=ignore is the default.

Here is the corresponding patch of the ext3 version:
http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-10 22:12:43 -04:00
Theodore Ts'o
03010a3350 ext4: Rename ext4dev to ext4
The ext4 filesystem is getting stable enough that it's time to drop
the "dev" prefix.  Also remove the requirement for the TEST_FILESYS
flag.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10 20:02:48 -04:00
Alexey Dobriyan
3bbfe05967 proc: remove kernel.maps_protect
After commit 831830b5a2 aka
"restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace"
sysctl stopped being relevant because commit moved security checks from ->show
time to ->start time (mm_for_maps()).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2008-10-10 04:24:51 +04:00
Mark Fasheh
c4b929b85b vfs: vfs-level fiemap interface
Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap
inode operation.

Userspace can get extent information on a file via fiemap ioctl. As input,
the fiemap ioctl takes a struct fiemap which includes an array of struct
fiemap_extent (fm_extents). Size of the extent array is passed as
fm_extent_count and number of extents returned will be written into
fm_mapped_extents. Offset and length fields on the fiemap structure
(fm_start, fm_length) describe a logical range which will be searched for
extents. All extents returned will at least partially contain this range.
The actual extent offsets and ranges returned will be unmodified from their
offset and range on-disk.

The fiemap ioctl returns '0' on success. On error, -1 is returned and errno
is set. If errno is equal to EBADR, then fm_flags will contain those flags
which were passed in which the kernel did not understand. On all other
errors, the contents of fm_extents is undefined.

As fiemap evolved, there have been many authors of the vfs patch. As far as
I can tell, the list includes:
Kalpak Shah <kalpak.shah@sun.com>
Andreas Dilger <adilger@sun.com>
Eric Sandeen <sandeen@redhat.com>
Mark Fasheh <mfasheh@suse.com>

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: linux-api@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
2008-10-08 19:44:18 -04:00
Theodore Ts'o
240799cdf2 ext4: Use readahead when reading an inode from the inode table
With modern hard drives, reading 64k takes roughly the same time as
reading a 4k block.  So request readahead for adjacent inode table
blocks to reduce the time it takes when iterating over directories
(especially when doing this in htree sort order) in a cold cache case.
With this patch, the time it takes to run "git status" on a kernel
tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
is reduced by 21%.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-09 23:53:47 -04:00
Theodore Ts'o
37515facd0 ext4: Improve the documentation for ext4's /proc tunables
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alex Tomas <bzzz@sun.com>
Cc: Andreas Dilger <adilger@sun.com>
2008-10-09 23:21:54 -04:00
Adrian Hunter
2953e73f1c UBIFS: add no_chk_data_crc mount option
UBIFS read performance can be improved by skipping the CRC
check when data nodes are read.  This option can be used if
the underlying media is considered to be highly reliable.
Note that CRCs are always checked for metadata.

Read speed on Arm platform with OneNAND goes from 19 MiB/s
to 27 MiB/s with data CRC checking disabled.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30 11:12:56 +03:00
Adrian Hunter
4793e7c5e1 UBIFS: add bulk-read facility
Some flash media are capable of reading sequentially at faster rates.
UBIFS bulk-read facility is designed to take advantage of that, by
reading in one go consecutive data nodes that are also located
consecutively in the same LEB.

Read speed on Arm platform with OneNAND goes from 17 MiB/s to
19 MiB/s.

Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-09-30 11:12:56 +03:00
Hidehiro Kawai
b261dfea48 coredump_filter: add description of bit 4
There is no description of bit 4 of coredump_filter in the
documentation.  This patch adds it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-13 14:41:51 -07:00
Christoph Hellwig
adaae7215e update Documentation/filesystems/Locking for 2.6.27 changes
In the 2.6.27 circle ->fasync lost the BKL, and the last remaining
->open variant that takes the BKL is also gone.  ->get_sb and ->kill_sb
didn't have BKL forever, so updated the entries while we're at that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-09 11:51:15 -07:00
Nadia Derbey
61e55d0576 ipc: document the new auto_msgmni proc file
Update Documentation/filesystems/proc.txt: it describes the file
auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon
memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27).  Also
added a description for msgmni (this filex is only listed in
Documentation/sysctl/kernel.txt).

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:39 -07:00
Adrian Bunk
169ccbd44e NTFS: update homepage
Update the location of the NTFS homepage in several files.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-02 19:21:37 -07:00
Linus Torvalds
ee26562772 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Update documentation to remind users to update mke2fs.conf
  ext4: Fix small file fragmentation
  ext4: Initialize writeback_index to 0 when allocating a new inode
  ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC
  ext4: journal credit fix for the delayed allocation's writepages() function
  ext4: Rework the ext4_da_writepages() function
  ext4: journal credits reservation fixes for DIO, fallocate
  ext4: journal credits reservation fixes for extent file writepage
  ext4: journal credits calulation cleanup and fix for non-extent writepage
  ext4: Fix bug where we return ENOSPC even though we have plenty of inodes
  ext4: don't try to resize if there are no reserved gdt blocks left
  ext4: Use ext4_discard_reservations instead of mballoc-specific call
  ext4: Fix ext4_dx_readdir hash collision handling
  ext4: Fix delalloc release block reservation for truncate
  ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty
  ext4: Handle unwritten extent properly with delayed allocation
2008-08-22 08:37:07 -07:00
Sebastian Siewior
2e244d0836 Documentation: fix typo in ubifs.txt
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-08-13 11:14:54 +03:00
Randy Dunlap
3794f3e812 docsrc: build Documentation/ sources
Currently source files in the Documentation/ sub-dir can easily bit-rot
since they are not generally buildable, either because they are hidden in
text files or because there are no Makefile rules for them.  This needs to
be fixed so that the source files remain usable and good examples of code
instead of bad examples.

Add the ability to build source files that are in the Documentation/ dir.
Add to Kconfig as "BUILD_DOCSRC" config symbol.

Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the
Documentation/ sources.  Or enable BUILD_DOCSRC in the *config system.
However, this symbol depends on HEADERS_CHECK since the header files need
to be installed (for userspace builds).

Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32,
sparc64, powerpc, sh, m68k, & mips.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-12 16:07:30 -07:00
Jan Kara
866c36637f quota: documentation for sending "below quota" messages via netlink and tiny doc update
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-12 16:07:27 -07:00
Joel Becker
ecb3d28c7e [PATCH] configfs: Convenience macros for attribute definition.
Sysfs has the _ATTR() and _ATTR_RO() macros to make defining extended
form attributes easier.  configfs should have something similiar.

- _CONFIGFS_ATTR() and _CONFIGFS_ATTR_RO() are the counterparts to the
  sysfs macros.
- CONFIGFS_ATTR_STRUCT() creates the extended form attribute structure.
- CONFIGFS_ATTR_OPS() defines the show_attribute()/store_attribute()
  operations that call the show()/store() operations of the extended
  form configfs_attributes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-31 16:21:13 -07:00
Theodore Ts'o
4537398d91 ext4: Update documentation to remind users to update mke2fs.conf
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-27 19:59:21 -04:00
Matt LaPlante
d91958815d Documentation cleanup: trivial misspelling, punctuation, and grammar corrections.
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:06 -07:00
Bob Copeland
a14e4b572b omfs: add filesystem documentation
These patches add the Optimized MPEG Filesystem, a proprietary filesystem used
by the embedded devices Rio Karma and ReplayTV, which are no longer
manufactured.  This filesystem module enables people to access files on these
devices.

This patch:

OMFS is a proprietary filesystem created for the ReplayTV and also used by the
Rio Karma.  It uses hash tables with unordered, unbounded lists in each bucket
for directories, extents for data blocks, 64-bit addressing for blocks, with
up to 8K blocks (only 2K of a given block is ever used for metadata, so the FS
still works with 4K pages).

Document the filesystem usage and structures.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:05 -07:00
Eduard - Gabriel Munteanu
20d8b67c06 relay: add buffer-only channels; useful for early logging
Allows one to create and use a channel with no associated files.  Files
can be initialized later.  This is useful in scenarios such as logging in
early code, before VFS is up.  Therefore, such channels can be created and
used as soon as kmem_cache_init() completed.

This is needed by kmemtrace to do tracing in early kernel code.

[kosaki.motohiro@jp.fujitsu.com: build fix]
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:04 -07:00
Joe Peterson
41003cde95 UTC timestamp option for FAT filesystems fix
Signed-off-by: Joe Peterson <joe@skyrush.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:34 -07:00
Eric Dumazet
a47a126ad5 vmallocinfo: add NUMA information
Christoph recently added /proc/vmallocinfo file to get information about
vmalloc allocations.

This patch adds NUMA specific information, giving number of pages
allocated on each memory node.

This should help to check that vmalloc() is able to respect NUMA policies.

Example of output on a four nodes machine (one cpu per node)

1) network hash tables are evenly spreaded on four nodes (OK) (Same
   point for inodes and dentries hash tables)

2) iptables tables (x_tables) are correctly allocated on each cpu node
   (OK).

3) sys_swapon() allocates its memory from one node only.

4) each loaded module is using memory on one node.

Sysadmins could tune their setup to change points 3) and 4) if necessary.

grep "pages="  /proc/vmallocinfo
0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
0xffffc2000031a000-0xffffc2000031d000   12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2
0xffffc20000341000-0xffffc20000344000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
0xffffc20000344000-0xffffc20000347000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
0xffffc20000347000-0xffffc2000034a000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
0xffffc2000034a000-0xffffc2000034d000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
0xffffc20004381000-0xffffc20004402000  528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32
0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256
0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743
0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
0xffffffffa0022000-0xffffffffa0028000   24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
0xffffffffa0028000-0xffffffffa0050000  163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
0xffffffffa0050000-0xffffffffa0052000    8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
0xffffffffa0052000-0xffffffffa0056000   16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
0xffffffffa0056000-0xffffffffa0081000  176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
0xffffffffa0081000-0xffffffffa00ae000  184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
0xffffffffa00ae000-0xffffffffa00b1000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00b1000-0xffffffffa00b9000   32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
0xffffffffa00b9000-0xffffffffa00c4000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
0xffffffffa00c6000-0xffffffffa00e0000  106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
0xffffffffa00e0000-0xffffffffa00f1000   69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
0xffffffffa00f1000-0xffffffffa00f4000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
0xffffffffa00f4000-0xffffffffa00f7000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2

[akpm@linux-foundation.org: fix comment]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Rik van Riel
28b2ee20c7 access_process_vm device memory infrastructure
In order to be able to debug things like the X server and programs using
the PPC Cell SPUs, the debugger needs to be able to access device memory
through ptrace and /proc/pid/mem.

This patch:

Add the generic_access_phys access function and put the hooks in place
to allow access_process_vm to access device or PPC Cell SPU memory.

[riel@redhat.com: Add documentation for the vm_ops->access function]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00
Linus Torvalds
6eaaaac974 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  remove CONFIG_KMOD from core kernel code
  remove CONFIG_KMOD from lib
  remove CONFIG_KMOD from sparc64
  rework try_then_request_module to do less in non-modular kernels
  remove mention of CONFIG_KMOD from documentation
  make CONFIG_KMOD invisible
  modules: Take a shortcut for checking if an address is in a module
  module: turn longs into ints for module sizes
  Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
  module: reorder struct module to save space on 64 bit builds
  module: generic each_symbol iterator function
  module: don't use stop_machine for waiting rmmod
2008-07-22 13:17:15 -07:00
Johannes Berg
a81792f668 remove mention of CONFIG_KMOD from documentation
Also includes a few Kconfig files (xtensa, blackfin)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
2008-07-22 19:24:29 +10:00
Dan Williams
e105b8bfc7 sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

[kay.sievers@vrfy.org: fix duplicate registrations]
[kay.sievers@vrfy.org: cleanup suggestions]

Cc: Neil Brown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Reviewed-by: SL Baur <steve@xemacs.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-21 21:54:40 -07:00
Linus Torvalds
14b395e35d Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
  nfsd: nfs4xdr.c do-while is not a compound statement
  nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
  lockd: Pass "struct sockaddr *" to new failover-by-IP function
  lockd: get host reference in nlmsvc_create_block() instead of callers
  lockd: minor svclock.c style fixes
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
  lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
  lockd: nlm_release_host() checks for NULL, caller needn't
  file lock: reorder struct file_lock to save space on 64 bit builds
  nfsd: take file and mnt write in nfs4_upgrade_open
  nfsd: document open share bit tracking
  nfsd: tabulate nfs4 xdr encoding functions
  nfsd: dprint operation names
  svcrdma: Change WR context get/put to use the kmem cache
  svcrdma: Create a kmem cache for the WR contexts
  svcrdma: Add flush_scheduled_work to module exit function
  svcrdma: Limit ORD based on client's advertised IRD
  svcrdma: Remove unused wait q from svcrdma_xprt structure
  svcrdma: Remove unneeded spin locks from __svc_rdma_free
  svcrdma: Add dma map count and WARN_ON
  ...
2008-07-20 21:21:46 -07:00
Joel Becker
a6795e9ebb configfs: Allow ->make_item() and ->make_group() to return detailed errors.
The configfs operations ->make_item() and ->make_group() currently
return a new item/group.  A return of NULL signifies an error.  Because
of this, -ENOMEM is the only return code bubbled up the stack.

Multiple folks have requested the ability to return specific error codes
when these operations fail.  This patch adds that ability by changing the
->make_item/group() ops to return ERR_PTR() values.  These errors are
bubbled up appropriately.  NULL returns are changed to -ENOMEM for
compatibility.

Also updated are the in-kernel users of configfs.

This is a rework of reverted commit 11c3b79218.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17 15:21:29 -07:00
Joel Becker
f89ab8619e Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors."
This reverts commit 11c3b79218.  The code
will move to PTR_ERR().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-17 14:53:48 -07:00
Linus Torvalds
5b664cb235 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  [PATCH] ocfs2: fix oops in mmap_truncate testing
  configfs: call drop_link() to cleanup after create_link() failure
  configfs: Allow ->make_item() and ->make_group() to return detailed errors.
  configfs: Fix failing mkdir() making racing rmdir() fail
  configfs: Fix deadlock with racing rmdir() and rename()
  configfs: Make configfs_new_dirent() return error code instead of NULL
  configfs: Protect configfs_dirent s_links list mutations
  configfs: Introduce configfs_dirent_lock
  ocfs2: Don't snprintf() without a format.
  ocfs2: Fix CONFIG_OCFS2_DEBUG_FS #ifdefs
  ocfs2/net: Silence build warnings on sparc64
  ocfs2: Handle error during journal load
  ocfs2: Silence an error message in ocfs2_file_aio_read()
  ocfs2: use simple_read_from_buffer()
  ocfs2: fix printk format warnings with OCFS2_FS_STATS=n
  [PATCH 2/2] ocfs2: Instrument fs cluster locks
  [PATCH 1/2] ocfs2: Add CONFIG_OCFS2_FS_STATS config option
2008-07-17 10:55:51 -07:00
Linus Torvalds
9c1be0c471 Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes
2008-07-16 15:02:57 -07:00
Linus Torvalds
61d97f4fcf Merge branch 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'genirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: remove extraneous checks in manage.c
  genirq: Expose default irq affinity mask (take 3)
2008-07-15 10:39:22 -07:00
Linus Torvalds
38c46578ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  [GFS2] Fix GFS2's use of do_div() in its quota calculations
  [GFS2] Remove unused declaration
  [GFS2] Remove support for unused and pointless flag
  [GFS2] Replace rgrp "recent list" with mru list
  [GFS2] Allow local DF locks when holding a cached EX glock
  [GFS2] Fix delayed demote race
  [GFS2] don't call permission()
  [GFS2] Fix module building
  [GFS2] Glock documentation
  [GFS2] Remove all_list from lock_dlm
  [GFS2] Remove obsolete conversion deadlock avoidance code
  [GFS2] Remove remote lock dropping code
  [GFS2] kernel panic mounting volume
  [GFS2] Revise readpage locking
  [GFS2] Fix ordering of args for list_add
  [GFS2] trivial sparse lock annotations
  [GFS2] No lock_nolock
  [GFS2] Fix ordering bug in lock_dlm
  [GFS2] Clean up the glock core
2008-07-15 10:38:46 -07:00
Artem Bityutskiy
e56a99d5a4 UBIFS: add brief documentation
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-07-15 09:22:12 +03:00
Joel Becker
11c3b79218 configfs: Allow ->make_item() and ->make_group() to return detailed errors.
The configfs operations ->make_item() and ->make_group() currently
return a new item/group.  A return of NULL signifies an error.  Because
of this, -ENOMEM is the only return code bubbled up the stack.

Multiple folks have requested the ability to return specific error codes
when these operations fail.  This patch adds that ability by changing the
->make_item/group() ops to return an int.

Also updated are the in-kernel users of configfs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-07-14 13:57:16 -07:00
Mingming Cao
49f1487b2e ext4: Documention update for new ordered mode and delayed allocation
Adding some documentations for delayed allocation and new ordered mode.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Jose R. Santos
93e3270c87 ext4: Documentation updates.
Some of the information in Documentation/filesystems/ext4.txt is out
of date and in need of an update.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
J. Bruce Fields
e86322f611 Merge branch 'for-bfields' of git://linux-nfs.org/~tomtucker/xprt-switch-2.6 into for-2.6.27 2008-07-03 16:24:06 -04:00
J. Bruce Fields
3cd2cfeae1 nfs: rewrap NFS/RDMA documentation to 80 lines
Wrap long lines.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-30 15:24:43 -04:00
James Lentini
007de8b4fd update NFS/RDMA documentation
Update the NFS/RDMA documentation to clarify how to run mount.nfs.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-30 15:24:43 -04:00
Steven Whitehouse
9f1585cb03 [GFS2] Glock documentation
This patch adds a file describing the internals of GFS2's glock
abstraction.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27 09:39:53 +01:00
Jesse Barnes
883eed1b3e Merge branch 'pci-for-jesse' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip into for-linus 2008-06-12 13:51:05 -07:00
venkatesh.pallipadi@intel.com
45aec1ae72 x86: PAT export resource_wc in pci sysfs
For the ranges with IORESOURCE_PREFETCH, export a new resource_wc interface in
pci /sysfs along with resource (which is uncached).

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 10:12:42 +02:00
Max Krasnyansky
1840475676 genirq: Expose default irq affinity mask (take 3)
Current IRQ affinity interface does not provide a way to set affinity
for the IRQs that will be allocated/activated in the future.
This patch creates /proc/irq/default_smp_affinity that lets users set
default affinity mask for the newly allocated IRQs. Changing the default
does not affect affinity masks for the currently active IRQs, they
have to be changed explicitly.

Updated based on Paul J's comments and added some more documentation.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: a.p.zijlstra@chello.nl
Cc: tglx@linutronix.de
Cc: rdunlap@xenotime.net
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-05 15:18:30 +02:00
Eric Sandeen
571640cad3 ext4: enable barriers by default
I can't think of any valid reason for ext4 to not use barriers when
they are available;  I believe this is necessary for filesystem
integrity in the face of a volatile write cache on storage.

An administrator who trusts that the cache is sufficiently battery-
backed (and power supplies are sufficiently redundant, etc...)
can always turn it back off again.

SuSE has carried such a patch for ext3 for quite some time now.

Also document the mount option while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-26 12:29:46 -04:00
Christoph Hellwig
33dcdac2df [PATCH] kill ->put_inode
And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-06 13:45:34 -04:00
Miklos Szeredi
b88473f73e mm: document missing fields for /proc/meminfo
A few fields in /proc/meminfo were not documented.  Fix.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
OGAWA Hirofumi
1ae43f826b fat: Add allow_utime option
Normally utime(2) checks current process is owner of the file, or it
has CAP_FOWNER capability.  But FAT filesystem doesn't have uid/gid as
on disk info, so normal check is too unflexible.

With this option you can relax it.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:47 -07:00
David Rientjes
65d66fc02e mempolicy: update NUMA memory policy documentation
Updates Documentation/vm/numa_memory_policy.txt and
Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags.

Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:19 -07:00
Nick Piggin
3c18ddd160 mm: remove nopage
Nothing in the tree uses nopage any more.  Remove support for it in the
core mm code and documentation (and a few stray references to it in
comments).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 08:58:18 -07:00
Linus Torvalds
cf2ec150fc Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
  nfsd: don't allow setting ctime over v4
  Update to NFS/RDMA documentation
  locks: don't call ->copy_lock methods on return of conflicting locks
  lockd: unlock lockd locks held for a certain filesystem
  lockd: unlock lockd locks associated with a given server ip
  leases: remove unneeded variable from fcntl_setlease().
  leases: move lock allocation earlier in generic_setlease()
  leases: when unlocking, skip locking-related steps
  leases: fix a return-value mixup
2008-04-25 11:45:40 -07:00
Jonathan Corbet
9f4def9ae4 Document seq_path_root()
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-25 11:56:37 -06:00
James Lentini
c272cca625 Update to NFS/RDMA documentation
Update to the NFS/RDMA documentation to clarify how to configure the
exports file.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-25 13:00:11 -04:00
Jonathan Corbet
22c36d18c6 Document SEQ_SKIP
2.6.26 adds a SEQ_SKIP return value for the seq_file show() function;
update the documentation to match.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-24 15:57:32 -06:00
James Lentini
a3fa73bd0e Documentation: NFS/RDMA instructions for 2.6.25-rc1
Add some instructions for using the new NFS/RDMA features.

Signed-off-by: James Lentini <jlentini@netapp.com>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:40 -04:00
Miklos Szeredi
97e7e0f71d [patch 7/7] vfs: mountinfo: show dominating group id
Show peer group ID of nearest dominating group that has intersection
with the mount's namespace.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-23 00:05:09 -04:00
Ram Pai
2d4d4864ac [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo
[mszeredi@suse.cz] rewrite and split big patch into managable chunks

/proc/mounts in its current form lacks important information:

 - propagation state
 - root of mount for bind mounts
 - the st_dev value used within the filesystem
 - identifier for each mount and it's parent

It also suffers from the following problems:

 - not easily extendable
 - ambiguity of mountpoints within a chrooted environment
 - doesn't distinguish between filesystem dependent and independent options
 - doesn't distinguish between per mount and per super block options

This patch introduces /proc/<pid>/mountinfo which attempts to address
all these deficiencies.

Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is
extracted into separate functions.

Thanks to Al Viro for the help in getting the design right.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-23 00:05:03 -04:00
Dan Williams
2424b5dd06 sysfs: refill attribute buffer when reading from offset 0
Requiring userspace to close and re-open sysfs attributes has been the
policy since before 2.6.12.  It allows userspace to get a consistent
snapshot of kernel state and consume it with incremental reads and seeks.

Now, if the file position is zero the kernel assumes userspace wants to see
the new value.  The application for this change is to allow a userspace
RAID metadata handler to check the state of an array without causing any
memory allocations.  Thus not causing writeback to a raid array that might
be blocked waiting for userspace to take action.

Cc: Neil Brown <neilb@suse.de>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-19 19:10:29 -07:00
Josef Sipek
f6e9f28865 [XFS] Update XFS documentation for noikeep/ikeep.
Mention how DMAPI affects default for noikeep.
Slightly modified since Josef's patch was based on
an old xfs.txt prior to Dave's (dgc) checkin which
missed going to oss.

Signed-off-by: Josef Sipek <jeffpc@josefsipek.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-04-18 12:18:42 +10:00
David Chinner
033bfb1a65 [XFS] Update XFS Documentation for ikeep and ihashsize
Update xfs docs for:
* In memory inode hashes has been removed.
* noikeep is now the default.

SGI-PV: 969561
SGI-Modid: 2.6.x-xfs-melb:linux:29481b

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2008-04-18 12:18:25 +10:00
Dmitri Vorobiev
b82d4043b3 Fix typos in Documentation/filesystems/seq_file.txt
A couple of typos crept into the newly added document about the seq_file
interface.  This patch corrects those typos and simultaneously deletes
unnecessary trailing spaces.

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15 19:35:40 -07:00
J. Bruce Fields
8bcd1cc293 Documentation: move rpc-cache.txt to filesystems/
This file is nfs-related.  (Maybe Documentation/filesystems/ would
benefit from a separate nfs/ directory at some point.)

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-11 13:20:52 -06:00
J. Bruce Fields
6ded55da6b Documentation: move nfsroot.txt to filesystems/
Documentation/ is a little large, and filesystems/ seems an obvious
place for this file.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-04-11 13:18:01 -06:00
Jan Engelhardt
f3271f6564 Fixes to the seq_file document
On Friday 2008-03-28 19:20, Jonathan Corbet wrote:
>commit 9756ccfda31b4c4544aa010aacf71b6672d668e8
>Date:   Fri Mar 28 11:19:56 2008 -0600
>
>    Add the seq_file documentation

patch on top:

  - add const qualifiers
  - remove void* casts
  - use proper specifier (%Ld is not valid)

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
2008-03-30 11:05:04 -06:00
Jonathan Corbet
ded4926aa2 Add the seq_file documentation
This is an updated version of the document describing the seq_file
interface.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-03-30 11:05:04 -06:00
Randy Dunlap
a09a20b526 laptops: move laptop-mode.txt to Documentation/laptops/
Move laptop-mode.txt into the laptops/ sub-directory to consolidate
laptop doc files there.

Update references to the file's location.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-03-12 02:37:21 -04:00
Miklos Szeredi
f84e3f521e mount options: add documentation
This series addresses the problem of showing mount options in
/proc/mounts.

Several filesystems which use mount options, have not implemented a
.show_options superblock operation.  Several others have implemented
this callback, but have not kept it fully up to date with the parsed
options.

Q: Why do we need correct option showing in /proc/mounts?
A: We want /proc/mounts to fully replace /etc/mtab.  The reasons for
   this are:
    - unprivileged mounters won't be able to update /etc/mtab
    - /etc/mtab doesn't work with private mount namespaces
    - /etc/mtab can become out-of-sync with reality

Q: Can't this be done, so that filesystems need not bother with
   implementing a .show_mounts callback, and keeping it up to date?
A: Only in some cases.  Certain filesystems allow modification of a
   subset of options in their remount_fs method.  It is not possible
   to take this into account without knowing exactly how the
   filesystem handles options.

For the simple case (no remount or remount resets all options) the
patchset introduces two helpers:

  generic_show_options()
  save_mount_options()

These can also be used to emulate the old /etc/mtab behavior, until
proper support is added.  Even if this is not 100% correct, it's still
better than showing no options at all.

The following patches fix up most in-tree filesystems, some have been
compile tested only, some have been reviewed and acked by the
maintainer.

Table displaying status of all in-kernel filesystems:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
legend:

  none - fs has options, but doesn't define ->show_options()
  some - fs defines ->show_options(), but some only options are shown
  good - fs shows all options
  noopt - fs does not have options
  patch - a patch will be posted
  merged - a patch has been merged by subsystem maintainer

9p          good
adfs        patch
affs        patch
afs         patch
autofs      patch
autofs4     patch
befs        patch
bfs         noopt
cifs        some
coda        noopt
configfs    noopt
cramfs      noopt
debugfs     noopt
devpts      patch
ecryptfs    good
efs         noopt
ext2        patch
ext3        good
ext4        merged
fat         patch
freevxfs    noopt
fuse        patch
fusectl	    noopt
gfs2        good
gfs2meta    noopt
hfs         good
hfsplus     good
hostfs      patch
hpfs        patch
hppfs       noopt
hugetlbfs   patch
isofs       patch
jffs2       noopt
jfs         merged
minix       noopt
msdos       ->fat
ncpfs       patch
nfs         some
nfsd        noopt
ntfs        good
ocfs2       good
ocfs2/dlmfs noopt
openpromfs  noopt
proc        noopt
qnx4        noopt
ramfs       noopt
reiserfs    patch
romfs       noopt
smbfs       good
sysfs       noopt
sysv        noopt
udf         patch
ufs         good
vfat        ->fat
xfs         good

mm/shmem.c                                    patch
drivers/oprofile/oprofilefs.c                 noopt
drivers/infiniband/hw/ipath/ipath_fs.c        noopt
drivers/misc/ibmasm/ibmasmfs.c                noopt
drivers/usb/core (usbfs)                      merged
drivers/usb/gadget (gadgetfs)                 noopt
drivers/isdn/capi/capifs.c                    patch
kernel/cpuset.c                               noopt
fs/binfmt_misc.c                              noopt
net/sunrpc/rpc_pipe.c                         noopt
arch/powerpc/platforms/cell/spufs             patch
arch/s390/hypfs                               good
ipc/mqueue.c                                  noopt
security (securityfs)                         noopt
security/selinux/selinuxfs.c                  noopt
kernel/cgroup.c                               good
security/smack/smackfs.c                      noopt

in -mm:

reiser4     some
unionfs     good
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This patch:

Document the rules for handling mount options in the .show_options
super operation.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:39 -08:00
Jan Kara
9b7880e7bb isofs: implement dmode option
Implement dmode option for iso9660 filesystem to allow setting of access
rights for directories on the filesystem.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Ilya N. Golubev" <gin@mo.msk.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:38 -08:00
David Howells
12debc4248 iget: remove iget() and the read_inode() super op as being obsolete
Remove the old iget() call and the read_inode() superblock operation it uses
as these are really obsolete, and the use of read_inode() does not produce
proper error handling (no distinction between ENOMEM and EIO when marking an
inode bad).

Furthermore, this removes the temptation to use iget() to find an inode by
number in a filesystem from code outside that filesystem.

iget_locked() should be used instead.  A new function is added in an earlier
patch (iget_failed) that is to be called to mark an inode as bad, unlock it
and release it should the get routine fail.  Mark iget() and read_inode() as
being obsolete and remove references to them from the documentation.

Typically a filesystem will be modified such that the read_inode function
becomes an internal iget function, for example the following:

	void thingyfs_read_inode(struct inode *inode)
	{
		...
	}

would be changed into something like:

	struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
	{
		struct inode *inode;
		int ret;

		inode = iget_locked(sb, ino);
		if (!inode)
			return ERR_PTR(-ENOMEM);
		if (!(inode->i_state & I_NEW))
			return inode;

		...
		unlock_new_inode(inode);
		return inode;
	error:
		iget_failed(inode);
		return ERR_PTR(ret);
	}

and then thingyfs_iget() would be called rather than iget(), for example:

	ret = -EINVAL;
	inode = iget(sb, ino);
	if (!inode || is_bad_inode(inode))
		goto error;

becomes:

	inode = thingyfs_iget(sb, ino);
	if (IS_ERR(inode)) {
		ret = PTR_ERR(inode);
		goto error;
	}

Note that is_bad_inode() does not need to be called.  The error returned by
thingyfs_iget() should render it unnecessary.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:29 -08:00
David Howells
b46980feed iget: introduce a function to register iget failure
Introduce a function to register failure in an inode construction path.  This
includes marking the inode under construction as bad, unlocking it and
releasing it.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
J. Bruce Fields
d3cf91d0e2 Documentation: move sharedsubtrees.txt to filesystems/
This documentation is also vfs-related.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:17 -08:00
J. Bruce Fields
e9b1a4d160 Documentation: move dnotify.txt to filesystems/
I'm inclined to think dnotify belongs in filesystems/.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:17 -08:00
Eric Dumazet
9cfe015aa4 get rid of NR_OPEN and introduce a sysctl_nr_open
NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:06 -08:00
Yasunori Goto
7786fa9ac5 Document lowmem_reserve_ratio
Though the lower_zone_protection was changed to lowmem_reserve_ratio, the
document has been not changed.  The lowmem_reserve_ratio seems quite hard
to estimate, but there is no guidance.  This patch is to change document
for it.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:19 -08:00
Bron Gondwana
195cf453d2 mm/page-writeback: highmem_is_dirtyable option
Add vm.highmem_is_dirtyable toggle

A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
approximately 2Gb size which contains a hash format that is written
randomly by the dbclean process.  On 2.6.16 this process took a few
minutes.  With lowmem only accounting of dirty ratios, this takes about 12
hours of 100% disk IO, all random writes.

Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
add the highmem back to the total available memory count.

[akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
Signed-off-by: Bron Gondwana <brong@fastmail.fm>
Cc: Ethan Solomita <solo@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:18 -08:00
Oliver Pinter
3eb43f689c Documentation/filesystems/porting fixes
typo fix and whitespace cleanup

Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:59:17 +02:00
Randy Dunlap
254012fd51 doc: use correct debugfs mountpoint
Use the normal, expected mountpoint in the relay(fs) example
for debugfs.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:30:51 +02:00
Leonardo Chiquitto
2e01e00e76 Documentation: missing proc/$PID/stat field
There's a missing field in the /proc/$PID/stat output documented in
Documentation/filesystems/proc.txt.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:17:16 +02:00
Masatake YAMATO
48cc7ec93f correct missing a double quote in configfs.txt
Signed-off-by: Masatake YAMATO <jet@gyve.org>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:10:08 +02:00
Denis Cheng
33b1302545 Documentation: fix type error
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 14:50:11 +02:00
Eric Paris
de6bbd1d30 [AUDIT] break large execve argument logging into smaller messages
execve arguments can be quite large.  There is no limit on the number of
arguments and a 4G limit on the size of an argument.

this patch prints those aruguments in bite sized pieces.  a userspace size
limitation of 8k was discovered so this keeps messages around 7.5k

single arguments larger than 7.5k in length are split into multiple records
and can be identified as aX[Y]=

Signed-off-by: Eric Paris <eparis@redhat.com>
2008-02-01 14:23:55 -05:00
Eric Dumazet
29e75252da [IPV4] route cache: Introduce rt_genid for smooth cache invalidation
Current ip route cache implementation is not suited to large caches.

We can consume a lot of CPU when cache must be invalidated, since we
currently need to evict all cache entries, and this eviction is
sometimes asynchronous. min_delay & max_delay can somewhat control this
asynchronism behavior, but whole thing is a kludge, regularly triggering
infamous soft lockup messages. When entries are still in use, this also
consumes a lot of ram, filling dst_garbage.list.

A better scheme is to use a generation identifier on each entry,
so that cache invalidation can be performed by changing the table
identifier, without having to scan all entries.
No more delayed flushing, no more stalling when secret_interval expires.

Invalidated entries will then be freed at GC time (controled by
ip_rt_gc_timeout or stress), or when an invalidated entry is found
in a chain when an insert is done.
Thus we keep a normal equilibrium.

This patch :
- renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
- Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
- Checks entry->rt_genid at appropriate places :
2008-01-31 19:28:27 -08:00
Alex Tomas
c9de560ded ext4: Add multi block allocator for ext4
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-29 00:19:52 -05:00
Girish Shilamkar
818d276ceb ext4: Add the journal checksum feature
The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Girish Shilamkar <girish@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-01-28 23:58:27 -05:00
Mark Fasheh
53fc622b9e [PATCH 2/2] ocfs2: cluster aware flock()
Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
mount option, "localflocks" is added so that users can revert to old
functionality as need be.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Sunil Mushran
2fbe8d1ebe ocfs2: Local alloc window size changeable via mount option
Local alloc is a performance optimization in ocfs2 in which a node
takes a window of bits from the global bitmap and then uses that for
all small local allocations. This window size is fixed to 8MB currently.
This patch allows users to specify the window size in MB including
disabling it by passing in 0. If the number specified is too large,
the fs will use the default value of 8MB.

mount -o localalloc=X /dev/sdX /mntpoint

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:43 -08:00
Mark Fasheh
d147b3d630 ocfs2: Support commit= mount option
Mostly taken from ext3. This allows the user to set the jbd commit interval,
in seconds. The default of 5 seconds stays the same, but now users can
easily increase the commit interval. Typically, this would be increased in
order to benefit performance at the expense of data-safety.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 15:05:42 -08:00
Mark Fasheh
1252c434e3 ocfs2: Documentation update
Remove 'readpages' from the list in ocfs2.txt. Instead of having two
identical lists, I just removed the list in the OCFS2 section of fs/Kconfig
and added a pointer to Documentation/filesystems/ocfs2.txt.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25 14:48:42 -08:00
Eric Van Hensbergen
b530cc7940 9p: add virtio transport
This adds a transport to 9p for communicating between guests and a host
using a virtio based transport.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-23 13:47:31 -05:00
Christoph Hellwig
e38f981758 exportfs: update documentation
Update documentation to the current state of affairs.  Remove duplicated
method descruptions in exportfs.h and point to Documentation/filesystems/
Exporting instead.  Add a little file header comment in expfs.c describing
what's going on and mentioning Neils and my copyright [1].

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:21 -07:00
Leonardo Chiquitto
b68f2c3a98 proc.txt: Add /proc/stat field
This patch updates the "cat /proc/stat" output found
in Documentation/filesystems/proc.txt.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 03:03:38 +02:00
Shaun Zinck
f8c34f9816 docs/sysfs: add missing word to sysfs attribute explanation
Add the obvious missing word.

Signed-off-by: Shaun Zinck <shaun.zinck@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 02:39:43 +02:00
Shaun Zinck
7356337bd2 documentation/ext3: grammar fixes
Fix some grammar in the explanation of the Journal Block Device layer.

Signed-off-by: Shaun Zinck <shaun.zinck@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 02:38:36 +02:00
Shaun Zinck
bc5b1d55cc Documentation/filesystems/vfs.txt: typo fix
Signed-off-by: Shaun Zinck <shaun.zinck@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 02:35:36 +02:00
Eric Dumazet
94547426db Documentation/filesystems/files.txt: remove rcuref_inc_lf() reverences
rcuref_inc_lf() is not used anymore. Replace it by atomic_inc_not_zero()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:47:49 +02:00
Matt LaPlante
01dd2fbf0d typo fixes
Most of these fixes were already submitted for old kernel versions, and were
approved, but for some reason they never made it into the releases.

Because this is a consolidation of a couple old missed patches, it touches both
Kconfigs and documentation texts.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20 01:34:40 +02:00
Robert P. J. Day
3a4fa0a25d Fix misspellings of "system", "controller", "interrupt" and "necessary".
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:10:43 +02:00
Linus Torvalds
9d8190f87b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: remove sysctl
  9p: fix bad kconfig cross-dependency
  9p: soften invalidation in loose_mode
  9p: attach-per-user
  9p: rename uid and gid parameters
  9p: define session flags
  9p: Make transports dynamic
2007-10-17 15:05:58 -07:00
Latchesar Ionkov
ba17674fe0 9p: attach-per-user
The 9P2000 protocol requires the authentication and permission checks to be
done in the file server. For that reason every user that accesses the file
server tree has to authenticate and attach to the server separately.
Multiple users can share the same connection to the server.

Currently v9fs does a single attach and executes all I/O operations as a
single user. This makes using v9fs in multiuser environment unsafe as it
depends on the client doing the permission checking.

This patch improves the 9P2000 support by allowing every user to attach
separately. The patch defines three modes of access (new mount option
'access'):

- attach-per-user (access=user) (default mode for 9P2000.u)
 If a user tries to access a file served by v9fs for the first time, v9fs
 sends an attach command to the server (Tattach) specifying the user. If
 the attach succeeds, the user can access the v9fs tree.
 As there is no uname->uid (string->integer) mapping yet, this mode works
 only with the 9P2000.u dialect.

- allow only one user to access the tree (access=<uid>)
 Only the user with uid can access the v9fs tree. Other users that attempt
 to access it will get EPERM error.

- do all operations as a single user (access=any) (default for 9P2000)
 V9fs does a single attach and all operations are done as a single user.
 If this mode is selected, the v9fs behavior is identical with the current
 one.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Latchesar Ionkov
bd32b82df9 9p: rename uid and gid parameters
Change the names of 'uid' and 'gid' parameters to the more appropriate
'dfltuid' and 'dfltgid'.  This also sets the default uid/gid to -2
(aka nfsnobody)

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Eric Van Hensbergen
a80d923e13 9p: Make transports dynamic
This patch abstracts out the interfaces to underlying transports so that
new transports can be added as modules.  This should also allow kernel
configuration of transports without ifdef-hell.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-10-17 14:31:07 -05:00
Joe Korty
38e760a133 x86: expand /proc/interrupts to include missing vectors, v2
Add missing IRQs and IRQ descriptions to /proc/interrupts.

/proc/interrupts is most useful when it displays every IRQ vector in use by
the system, not just those somebody thought would be interesting.

This patch inserts the following vector displays to the i386 and x86_64
platforms, as appropriate:

	rescheduling interrupts
	TLB flush interrupts
	function call interrupts
	thermal event interrupts
	threshold interrupts
	spurious interrupts

A threshold interrupt occurs when ECC memory correction is occuring at too
high a frequency.  Thresholds are used by the ECC hardware as occasional
ECC failures are part of normal operation, but long sequences of ECC
failures usually indicate a memory chip that is about to fail.

Thermal event interrupts occur when a temperature threshold has been
exceeded for some CPU chip.  IIRC, a thermal interrupt is also generated
when the temperature drops back to a normal level.

A spurious interrupt is an interrupt that was raised then lowered by the
device before it could be fully processed by the APIC.  Hence the apic sees
the interrupt but does not know what device it came from.  For this case
the APIC hardware will assume a vector of 0xff.

Rescheduling, call, and TLB flush interrupts are sent from one CPU to
another per the needs of the OS.  Typically, their statistics would be used
to discover if an interrupt flood of the given type has been occuring.

AK: merged v2 and v4 which had some more tweaks
AK: replace Local interrupts with Local timer interrupts
AK: Fixed description of interrupt types.

[ tglx: arch/x86 adaptation ]
[ mingo: small cleanup ]

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Tim Hockin <thockin@hockin.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-17 20:16:53 +02:00
Denis Cheng
5d4b355933 Documentation: add entries to filesystems/00-INDEX for several untracked files
Signed-off-by: Denis Cheng <crquan@gmail.com>
Cc: Rob Landley <rob@landley.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:05 -07:00
Jan Kara
8e8934695d quota: send messages via netlink
Implement sending of quota messages via netlink interface.  The advantage
is that in userspace we can better decide what to do with the message - for
example display a dialogue in your X session or just write the message to
the console.  As a bonus, we can get rid of problems with console locking
deep inside filesystem code once we remove the old printing mechanism.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Randy Dunlap
1810732e94 docs: ramdisk/initrd/initramfs corrections
initrd/initramfs/ramdisk docs:
- fix typos/spellos/grammar
- clarify RAM disk config location
- correct cpio option

Acked-by: Bryan O'Sullivan <bos@serpentine.com>
Acked-by: Rob Landley <rob@landley.net>
Cc: Werner Almesberger <werner@almesberger.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:56 -07:00
Nick Piggin
55144768e1 fs: remove some AOP_TRUNCATED_PAGE
prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and
GFS2 were converted to the new aops, so we can make some simplifications
for that.

[michal.k.k.piotrowski@gmail.com: fix warning]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:58 -07:00
Nick Piggin
afddba49d1 fs: introduce write_begin, write_end, and perform_write aops
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).

[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:55 -07:00
Linus Torvalds
541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
Anton Altaparmakov
bfab36e816 NTFS: Fix a mount time deadlock.
Big thanks go to Mathias Kolehmainen for reporting the bug, providing
debug output and testing the patches I sent him to get it working.

The fix was to stop calling ntfs_attr_set() at mount time as that causes
balance_dirty_pages_ratelimited() to be called which on systems with
little memory actually tries to go and balance the dirty pages which tries
to take the s_umount semaphore but because we are still in fill_super()
across which the VFS holds s_umount for writing this results in a
deadlock.

We now do the dirty work by hand by submitting individual buffers.  This
has the annoying "feature" that mounting can take a few seconds if the
journal is large as we have clear it all.  One day someone should improve
on this by deferring the journal clearing to a helper kernel thread so it
can be done in the background but I don't have time for this at the moment
and the current solution works fine so I am leaving it like this for now.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12 09:16:30 -07:00
J. Bruce Fields
98257af5a2 Documentation: move locks.txt in filesystems/
This documentation (about file locking) belongs in filesystems/.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:32:45 -04:00
J. Bruce Fields
9efa68ed07 locks: add warning about mandatory locking races
The mandatory file locking implementation has long-standing races that
probably render it useless.  I know of no plans to fix them.  Till we
do, we should at least warn people.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-10-09 18:32:45 -04:00
J. Bruce Fields
4f3b19ca41 Documentation: move mandatory locking documentation to filesystems/
Shouldn't this mandatory-locking documentation be in the
Documentation/filesystems directory?

Give it a more descriptive name while we're at it, and update 00-INDEX
with a more inclusive description of Documentation/filesystems (which
has already talked about more than just individual filesystems).

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
2007-10-09 18:32:45 -04:00
Linus Torvalds
577107e8e4 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Fix calculation of i_blocks during truncate
  [PATCH] ocfs2: Fix a wrong cluster calculation.
  [PATCH] ocfs2: fix mount option parsing
  ocfs2: update docs for new features
2007-09-11 17:23:16 -07:00
Rob Landley
c811ac5366 Documentation/00-INDEX: notice ecryptfs.txt moved
ecryptfs.txt moved into filesystems, make 00-INDEX follow.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-11 17:21:19 -07:00
Mark Fasheh
10b0845bed ocfs2: update docs for new features
Update documentation listing ocfs2 features to reflect the current state of
the file system. Add missing descriptions for some mount options which ocfs2
supports.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11 11:38:25 -07:00
Eric Van Hensbergen
27a2a5ff41 9p: update maintainers and documentation
Updates to the MAINTAINERS file and documentation for 9p to point to the
swik wiki versus the outdated sf.net page.  Also updated some email addresses
and added pointers to papers which better describe the implementation and
application of the Linux 9p client.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-08-23 10:12:48 -05:00
Wyatt Banks
60fd4d6a19 Documentation: document HFSPlus
Documentation: document HFSPlus filesystem and its mount options.

Signed-off-by: Wyatt Banks <wyatt@banksresearch.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:38 -07:00
Yoann Padioleau
dd00cc486a some kmalloc/memset ->kzalloc (tree wide)
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

Here is a short excerpt of the semantic patch performing
this transformation:

@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@

 x =
- kmalloc
+ kzalloc
  (E1,E2)
  ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);

@@
expression E1,E2,E3;
@@

- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)

[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:50 -07:00
Kawai, Hidehiro
bb90110dcb coredump masking: documentation for /proc/pid/coredump_filter
This patch adds the documentation for /proc/<pid>/coredump_filter.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:47 -07:00
Peter Zijlstra
bdf4c48af2 audit: rework execve audit
The purpose of audit_bprm() is to log the argv array to a userspace daemon at
the end of the execve system call.  Since user-space hasn't had time to run,
this array is still in pristine state on the process' stack; so no need to
copy it, we can just grab it from there.

In order to minimize the damage to audit_log_*() copy each string into a
temporary kernel buffer first.

Currently the audit code requires that the full argument vector fits in a
single packet.  So currently it does clip the argv size to a (sysctl) limit,
but only when execve auditing is enabled.

If the audit protocol gets extended to allow for multiple packets this check
can be removed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ollie Wild <aaw@google.com>
Cc: <linux-audit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:45 -07:00
Nick Piggin
d0217ac04c mm: fault feedback #1
Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Mark Fasheh
ed2f2f9b3f Document ->page_mkwrite() locking
There seems to be very little documentation about this callback in general.
The locking in particular is a bit tricky, so it's worth having this in
writing.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin
54cb8821de mm: merge populate and nopage into fault (fixes nonlinear)
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Josef 'Jeff' Sipek
5d91192e66 eCryptfs: Move ecryptfs docs into Documentation/filesystems/
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Acked-by: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:08 -07:00
Mel Gorman
ed7ed36517 handle kernelcore=: generic
This patch adds the kernelcore= parameter for x86.

Once all patches are applied, a new command-line parameter exist and a new
sysctl.  This patch adds the necessary documentation.

From: Yasunori Goto <y-goto@jp.fujitsu.com>

  When "kernelcore" boot option is specified, kernel can't boot up on ia64
  because of an infinite loop.  In addition, the parsing code can be handled
  in an architecture-independent manner.

  This patch uses common code to handle the kernelcore= parameter.  It is
  only available to architectures that support arch-independent zone-sizing
  (i.e.  define CONFIG_ARCH_POPULATES_NODE_MAP).  Other architectures will
  ignore the boot parameter.

[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:22:59 -07:00
Linus Torvalds
add096909d Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
  [PATCH] ocfs2: zero_user_page conversion
  ocfs2: Support xfs style space reservation ioctls
  ocfs2: support for removing file regions
  ocfs2: update truncate handling of partial clusters
  ocfs2: btree support for removal of arbirtrary extents
  ocfs2: Support creation of unwritten extents
  ocfs2: support writing of unwritten extents
  ocfs2: small cleanup of ocfs2_write_begin_nolock()
  ocfs2: btree changes for unwritten extents
  ocfs2: abstract btree growing calls
  ocfs2: use all extent block suballocators
  ocfs2: plug truncate into cached dealloc routines
  ocfs2: simplify deallocation locking
  ocfs2: harden buffer check during mapping of page blocks
  ocfs2: shared writeable mmap
  ocfs2: factor out write aops into nolock variants
  ocfs2: rework ocfs2_buffered_write_cluster()
  ocfs2: take ip_alloc_sem during entire truncate
  ocfs2: Add "preferred slot" mount option
  [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
  ...
2007-07-16 10:52:55 -07:00
Borislav Petkov
422b14c2e2 update Documentation/filesystems/vfs.txt
Update Documentation/filesystems/vfs.txt

Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:51 -07:00
Borislav Petkov
0746aec3c7 update description in Documentation/filesystems/vfs.txt
Update the description of struct file_system_type and get_sb() in
Documentation/filesystems/vfs.txt to match the current code.

Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:48 -07:00
Kees Cook
18d96779d9 Documentation: /proc/$pid/stat files
Documentation for the /proc/$pid/stat file.

Signed-off-by: Kees Cook <kees@outflux.net>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:45 -07:00
Joel Becker
631d1febab configfs: config item dependancies.
Sometimes other drivers depend on particular configfs items.  For
example, ocfs2 mounts depend on a heartbeat region item.  If that
region item is removed with rmdir(2), the ocfs2 mount must BUG or go
readonly.  Not happy.

This provides two additional API calls: configfs_depend_item() and
configfs_undepend_item().  A client driver can call
configfs_depend_item() on an existing item to tell configfs that it is
depended on.  configfs will then return -EBUSY from rmdir(2) for that
item.  When the item is no longer depended on, the client driver calls
configfs_undepend_item() on it.

These API cannot be called underneath any configfs callbacks, as
they will conflict.  They can block and allocate.  A client driver
probably shouldn't calling them of its own gumption.  Rather it should
be providing an API that external subsystems call.

How does this work?  Imagine the ocfs2 mount process.  When it mounts,
it asks for a heart region item.  This is done via a call into the
heartbeat code.  Inside the heartbeat code, the region item is looked
up.  Here, the heartbeat code calls configfs_depend_item().  If it
succeeds, then heartbeat knows the region is safe to give to ocfs2.
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.

[ Fixed some bad whitespace in configfs.txt. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:18:59 -07:00
Joel Becker
299894cc90 configfs: accessing item hierarchy during rmdir(2)
Add a notification callback, ops->disconnect_notify(). It has the same
prototype as ->drop_item(), but it will be called just before the item
linkage is broken. This way, configfs users who want to do work while
the object is still in the heirarchy have a chance.

Client drivers will still need to config_item_put() in their
->drop_item(), if they implement it.  They need do nothing in
->disconnect_notify().  They don't have to provide it if they don't
care.  But someone who wants to be notified before ci_parent is set to
NULL can now be notified.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:11:01 -07:00
Joel Becker
e6bd07aee7 configfs: Convert subsystem semaphore to mutex
Convert the su_sem member of struct configfs_subsystem to a struct
mutex, as that's what it is. Also convert all the users and update
Documentation/configfs.txt and Documentation/configfs_example.c
accordingly.

[ Conflict in fs/dlm/config.c with commit
  3168b0780d manually resolved. --Mark ]

Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:10:56 -07:00
Hugh Dickins
a210906c1b mount -t tmpfs -o mpol=: check nodes online
Randy Dunlap reports that a tmpfs, mounted with NUMA mpol= specifying an
offline node, crashes as soon as data is allocated upon it.  Now restrict it
to online nodes, where before it restricted to MAX_NUMNODES.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Tested-and-acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-08 17:23:32 -07:00
Josef 'Jeff' Sipek
c2b38989cf Documentation: Fix up docs still talking about i_sem
.. it got changed to 'i_mutex' some time ago.

Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-24 10:16:17 -07:00
Artem Bityutskiy
a7bc02f4f4 trivial: s/i_sem /i_mutex/
This patch substitutes i_sem by i_mutex in
Documentation/filesystems/Locking.
The patch also removes a couple of trailing white-spaces.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:17 +02:00
Matt LaPlante
a982ac06b0 misc doc and kconfig typos
Fix various typos in kernel docs and Kconfigs, 2.6.21-rc4.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:58:15 +02:00
Randy Dunlap
8b60756a62 Fix more "deprecated" spellos.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:19:14 +02:00
Linus Torvalds
18062a91d2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: Fix race waking up jfsIO kernel thread
  JFS: use __set_current_state()
  Copy i_flags to jfs inode flags on write
  JFS: document uid, gid, and umask mount options in jfs.txt
2007-05-08 11:32:30 -07:00
OGAWA Hirofumi
28ec039c21 fat: don't use free_clusters for fat32
It seems that the recent Windows changed specification, and it's
undocumented.  Windows doesn't update ->free_clusters correctly.

This patch doesn't use ->free_clusters by default.  (instead, add "usefree"
for forcing to use it)

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Juergen Beisert <juergen127@kreuzholzen.de>
Cc: Andreas Schwab <schwab@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:13 -07:00
Eric Dumazet
c23fbb6bcb VFS: delay the dentry name generation on sockets and pipes
1) Introduces a new method in 'struct dentry_operations'.  This method
   called d_dname() might be called from d_path() to build a pathname for
   special filesystems.  It is called without locks.

   Future patches (if we succeed in having one common dentry for all
   pipes/sockets) may need to change prototype of this method, but we now
   use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

2) Adds a dynamic_dname() helper function that eases d_dname() implementations

3) Defines d_dname method for sockets : No more sprintf() at socket
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

4) Defines d_dname method for pipes : No more sprintf() at pipe
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :

3.090 s instead of 3.450 s

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:03 -07:00
Kees Cook
5096add84b proc: maps protection
The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
information about the memory location and usage of processes.  Issues:

- maps should not be world-readable, especially if programs expect any
  kind of ASLR protection from local attackers.
- maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
  check the maps when %n is in a *printf call, and a setuid(getuid())
  process wouldn't be able to read its own maps file.  (For reference
  see http://lkml.org/lkml/2006/1/22/150)
- a system-wide toggle is needed to allow prior behavior in the case of
  non-root applications that depend on access to the maps contents.

This change implements a check using "ptrace_may_attach" before allowing
access to read the maps contents.  To control this protection, the new knob
/proc/sys/kernel/maps_protect has been added, with corresponding updates to
the procfs documentation.

[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: New sysctl numbers are old hat]
Signed-off-by: Kees Cook <kees@outflux.net>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:02 -07:00
David Rientjes
b813e931b4 smaps: add clear_refs file to clear reference
Adds /proc/pid/clear_refs.  When any non-zero number is written to this file,
pte_mkold() and ClearPageReferenced() is called for each pte and its
corresponding page, respectively, in that task's VMAs.  This file is only
writable by the user who owns the task.

It is now possible to measure _approximately_ how much memory a task is using
by clearing the reference bits with

	echo 1 > /proc/pid/clear_refs

and checking the reference count for each VMA from the /proc/pid/smaps output
at a measured time interval.  For example, to observe the approximate change
in memory footprint for a task, write a script that clears the references
(echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
extracts the size in kB.  Add the sizes for each VMA together for the total
referenced footprint.  Moments later, repeat the process and observe the
difference.

For example, using an efficient Mozilla:

	accumulated time		referenced memory
	----------------		-----------------
		 0 s				 408 kB
		 1 s				 408 kB
		 2 s				 556 kB
		 3 s				1028 kB
		 4 s				 872 kB
		 5 s				1956 kB
		 6 s				 416 kB
		 7 s				1560 kB
		 8 s				2336 kB
		 9 s				1044 kB
		10 s				 416 kB

This is a valuable tool to get an approximate measurement of the memory
footprint for a task.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
[akpm@linux-foundation.org: build fixes]
[mpm@selenic.com: rename for_each_pmd]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:52 -07:00
David Howells
0795e7c031 [AFS]: Update the AFS fs documentation.
Update the AFS fs documentation.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:57:43 -07:00
Stephen Hemminger
a2a316fd06 [NET]: Replace CONFIG_NET_DEBUG with sysctl.
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:05 -07:00
Dave Kleikamp
ba863a0016 JFS: document uid, gid, and umask mount options in jfs.txt
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
2007-03-09 10:27:31 -06:00
Roland Kletzing
f9c99463b0 [PATCH] Documentation for io-accounting / reporting via procfs
Add some documentation for the new and very useful io-accounting feature.
It's being added to Documentation/filesystems/proc.txt

Signed-off-by: Roland Kletzing <devzero@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-05 07:57:54 -08:00
Nick Piggin
955eff5acc [PATCH] fs: fix libfs data leak
simple_prepare_write leaks uninitialised kernel data.  This happens because
the it leaves an uninitialised "hole" over the part of the page that the
write is expected to go to.  This is fine, but it then marks the page
uptodate, which means a concurrent read can come in and copy the
uninitialised memory into userspace before it written to.

Fix it by simply marking it uptodate in simple_commit_write instead, after
the hole has been filled in.  This could theoretically break an fs that
uses simple_prepare_write and not simple_commit_write, and that relies on
the incorrect simple_prepare_write behaviour.  Luckily, none of those
exists in the tree.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-20 17:10:15 -08:00
Linus Torvalds
2874b391bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: implement optional loose read cache
  9p: Use kthread_stop instead of sending a SIGKILL.
2007-02-19 13:33:01 -08:00
Eric Van Hensbergen
e03abc0c96 9p: implement optional loose read cache
While cacheing is generally frowned upon in the 9p world, it has its
place -- particularly in situations where the remote file system is
exclusive and/or read-only.  The vacfs views of venti content addressable
store are a real-world instance of such a situation.  To facilitate higher
performance for these workloads (and eventually use the fscache patches),
we have enabled a "loose" cache mode which does not attempt to maintain
any form of consistency on the page-cache or dcache.  This results in over
two orders of magnitude performance improvement for cacheable block reads
in the Bonnie benchmark.  The more aggressive use of the dcache also seems
to improve metadata operational performance.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2007-02-18 10:16:10 -06:00
Uwe Kleine-König
1b3c3714cb Fix typos concerning hierarchy
heirarchical, hierachical -> hierarchical
        heirarchy, hierachy -> hierarchy

Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:23:03 +01:00
Evgeniy Dushistov
cbcae39fa1 [PATCH] ufs2 write: mount as rw
These series of patches add UFS2 write-support.  UFS2 - is default file system
for recent versions of FreeBSD.

The main differences from UFS1 from write support point of view
are:
1)Not all inodes are allocated during formatation of disk.
2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they
are 32bit).

So patch series consist of
1)make possible mount UFS2 in read-write mode
2)code to write ufs2 inodes and code to initialize inodes chunks.
3)work with 64bit meta-data

I made simple testing like create/deleting/writing/reading/truncating, also I
ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD
fsck do not find any errors in fs.

This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation:
remove note about bug(it fixed by reallocate blocks on the fly patch) and add
me in the list of people who want receive bug reports.

Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Mathieu Desnoyers
23c887522e [PATCH] Relay: add CPU hotplug support
Mathieu originally needed to add this for tracing Xen, but it's something
that's needed for any application that can be tracing while cpus are added.

unplug isn't supported by this patch.  The thought was that at minumum a new
buffer needs to be added when a cpu comes up, but it wasn't worth the effort
to remove buffers on cpu down since they'd be freed soon anyway when the
channel was closed.

[zanussi@us.ibm.com: avoid lock_cpu_hotplug deadlock]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:28 -08:00
Eric Van Hensbergen
ff76e1dfc8 [PATCH] 9p: update documentation regarding server applications
Update the documentation to cover using Inferno as a server for 9p and to
include information about spfs (a stable single-threaded stand-alone 9p
server).

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26 13:50:59 -08:00
Anton Altaparmakov
8331191e56 NTFS: 2.1.28 - Fix deadlock reported by Sergey Vlasov due to ntfs_put_inode().
- Fix deadlock in fs/ntfs/inode.c::ntfs_put_inode().  Thanks to Sergey
  Vlasov for the report and detailed analysis of the deadlock.  The fix
  involved getting rid of ntfs_put_inode() altogether and hence NTFS no
  longer has a ->put_inode super operation.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2007-01-18 09:42:48 +00:00
Trond Myklebust
e3db7691e9 [PATCH] NFS: Fix race in nfs_release_page()
NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00
Alexey Dobriyan
91f6e54b6e [PATCH] fuse: fix typo
Signed-off-by: Thomas Hisch <t.hisch@gmail.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:45 -08:00
Tigran Aivazian
69688262fb [PATCH] update Tigran's email addresses
As Adrian pointed out recently, there were still a couple of places where
I should have fixed my email address.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:53 -08:00
Tiger Yang
bcd5625bff ocfs2: update mount option documentation
We forgot to document the atime_quantum mount option in ocfs2.txt. This adds
a proper description of how it works.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07 17:48:41 -08:00
Adrian Bunk
3982cd99c3 [PATCH] fs/sysv/: doc cleanup
Remove two different changelog files from fs/sysv/ and merges the INTRO
file into Documentation/filesystems/sysv-fs.txt

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:44 -08:00
Vasily Averin
70888bd5b7 [PATCH] Documentation: remount_fs() needs lock_kernel
Fixed long-lived typo: remount_fs() needs BKL

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:36 -08:00
Miklos Szeredi
d809161402 [PATCH] fuse: add blksize option
Add 'blksize' option for block device based filesystems.  During
initialization this is used to set the block size on the device and the super
block.  The default block size is 512bytes.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:31 -08:00
Miklos Szeredi
d6392f873f [PATCH] fuse: add support for block device based filesystems
I never intended this, but people started using fuse to implement block device
based "real" filesystems (ntfs-3g, zfs).

The following four patches add better support for these kinds of filesystems.
Unlike "normal" fuse filesystems, using this feature should require superuser
privileges (enforced by the fusermount utility).

Thanks to Szabolcs Szakacsits for the input and testing.

This patch adds a 'fuseblk' filesystem type, which is only different from the
'fuse' filesystem type in how the 'dev_name' mount argument is interpreted.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:31 -08:00
Matt LaPlante
5d3f083d8f Fix typos in /Documentation : Misc
This patch fixes typos in various Documentation txts. The patch addresses some
misc words.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 05:21:10 +01:00
Matt LaPlante
4ae0edc21b Fix typos in /Documentation : 'U-Z'
This patch fixes typos in various Documentation txts. The patch addresses some
+words starting with the letters 'U-Z'.

Looks like I made it through the alphabet...just in time to start over again
+too!  Maybe I can fit more profound fixes into the next round...?  Time will
+tell. :)

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 04:58:40 +01:00
Matt LaPlante
fa00e7e152 Fix typos in /Documentation : 'T''
This patch fixes typos in various Documentation txts. The patch addresses some
+words starting with the letter 'T'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-30 04:55:36 +01:00
Phillip Susi
55aa601e14 [PATCH] Update udf documentation to reflect current state of read/write support
Change Documentation/filesystems/udf.txt from saying that read/write mounts
on cd media are not supported to instead state the current level of
support.  Specifically that it works fine on dvd+rw media and can be made
to work on cd-rw media via the pktcdvd device.

Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-16 11:43:38 -08:00
Dave Kleikamp
fc513a333b [PATCH] Documentation/filesystems/ext4.txt
This file, ext4.txt, was put together with information from Andrew Morton,
Andreas Dilger, Suparna Bhattacharya, and Ted Ts'o.

I copied the mount options, with the exception of "extents", from ext3.txt,
so if anyone is aware of anything out-of-date, please let me know.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-11 11:14:19 -07:00
Linus Torvalds
4a61f17378 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6: (292 commits)
  [GFS2] Fix endian bug for de_type
  [GFS2] Initialize SELinux extended attributes at inode creation time.
  [GFS2] Move logging code into log.c (mostly)
  [GFS2] Mark nlink cleared so VFS sees it happen
  [GFS2] Two redundant casts removed
  [GFS2] Remove uneeded endian conversion
  [GFS2] Remove duplicate sb reading code
  [GFS2] Mark metadata reads for blktrace
  [GFS2] Remove iflags.h, use FS_
  [GFS2] Fix code style/indent in ops_file.c
  [GFS2] streamline-generic_file_-interfaces-and-filemap gfs fix
  [GFS2] Remove readv/writev methods and use aio_read/aio_write instead (gfs bits)
  [GFS2] inode-diet: Eliminate i_blksize from the inode structure
  [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)
  [GFS2] Fix typo in last patch
  [GFS2] Fix direct i/o logic in filemap.c
  [GFS2] Fix bug in Makefiles for lock modules
  [GFS2] Remove (extra) fs_subsys declaration
  [GFS2/DLM] Fix trailing whitespace
  [GFS2] Tidy up meta_io code
  ...
2006-10-04 09:06:16 -07:00
Paolo Ornati
670e9f34ee Documentation: remove duplicated words
Remove many duplicated words under Documentation/ and do other small
cleanups.

Examples:
        "and and" --> "and"
        "in in" --> "in"
        "the the" --> "the"
        "the the" --> "to the"
        ...

Signed-off-by: Paolo Ornati <ornati@fastwebnet.it>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:57:56 +02:00
Matt LaPlante
53cb47268e Fix typos in Documentation/: 'S'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letter 'S'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:55:17 +02:00
Matt LaPlante
d6bc8ac9e1 Fix typos in Documentation/: 'Q'-'R'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letters 'Q'-'R'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:54:15 +02:00
Matt LaPlante
84eb8d0608 Fix "can not" in Documentation and Kconfig
Randy brought it to my attention that in proper english "can not" should always
be written "cannot". I donot see any reason to argue, even if I mightnot
understand why this rule exists.  This patch fixes "can not" in several
Documentation files as well as three Kconfigs.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:53:09 +02:00
Matt LaPlante
992caacf11 Fix typos in Documentation/: 'N'-'P'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letters 'N'-'P'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:52:05 +02:00
Matt LaPlante
2fe0ae78c6 Fix typos in Documentation/: 'H'-'M'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letters 'H'-'M'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:50:39 +02:00
Matt LaPlante
a2ffd27516 Fix typos in Documentation/: 'F'-'G'
This patch fixes typos in various Documentation txts. The patch addresses
some words starting with the letters 'F'-'G'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:49:15 +02:00
Matt LaPlante
fff9289b21 Fix typos in Documentation/: 'D'-'E'
This patch fixes typos in various Documentation txts. This patch addresses
some words starting with the letters 'D'-'E'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:47:42 +02:00
Matt LaPlante
6c28f2c0f2 Fix typos in Documentation/: 'B'-'C'
This patch fixes typos in various Documentation txts. This patch addresses some
words starting with the letters 'B'-'C'.  There are also a few grammar fixes
thrown in for Randy. ;)

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:46:31 +02:00
Matt LaPlante
3f6dee9b2a Fix some typos in Documentation/: 'A'
This patch fixes typos in various Documentation txts.
This patch addresses some words starting with the letter 'A'.

Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:45:33 +02:00
Adrian Bunk
bf6ee0ae49 remove mentionings of devfs in documentation
Now that devfs is removed, there's no longer any need to document how to
do this or that with devfs.

This patch includes some improvements by Joe Perches.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-10-03 22:17:48 +02:00
Steven Whitehouse
59458f40e2 Merge branch 'master' into gfs2 2006-10-02 08:45:08 -04:00
Badari Pulavarty
027445c372 [PATCH] Vectorize aio_read/aio_write fileop methods
This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:28 -07:00
Jan-Frode Myklebust
d7ff0dbf45 [PATCH] oom_adj/oom_score documentation
I was looking for the a way around an OOM-problem, and found a couple of
undocumented new features for tuning the OOM-score of individual processes.
 Here's a small documentation patch for /proc/<pid>/oom_adj and
/proc/<pid>/oom_score.

Signed-off-by: Jan-Frode Myklebust <mykleb@no.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-29 09:18:10 -07:00
Steven Whitehouse
185a257f2f Merge branch 'master' into gfs2 2006-09-28 08:29:59 -04:00
Don Zickus
e33e89ab1a [PATCH] x86: Add abilty to enable/disable nmi watchdog from procfs (update)
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.

By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system.  By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Steven Whitehouse
83b7a664a0 Merge branch 'master' into gfs2 2006-08-29 11:39:34 -04:00
Tom Zanussi
e88d78f6ba [PATCH] Documentation update for relay interface
Here's updated documentation for the relay interface, rewritten to match
the relayfs->relay changes.  It also moves relayfs.txt to relay.txt in the
process.

It includes the changes to relayfs.txt previously posted by Randy Dunlap,
thanks for those.

The relay-apps examples have also been updated to match, and can be found
on the sourceforge relayfs website.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-27 11:01:31 -07:00
Steven Whitehouse
4bf311ddfb Merge branch 'master' 2006-07-17 09:25:26 -04:00
Jonathan Corbet
5d8b2ebfa2 [PATCH] VFS documentation tweak
As I was looking over the get_sb() changes, I stumbled across a little
mistake in the documentation updates.  Unless we're getting into an
interesting new object-oriented realm, I doubt that get_sb() should really
return "struct int"...

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10 13:24:15 -07:00
Steven Whitehouse
0a1340c185 Merge rsync://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/linux/kernel.h
2006-07-03 10:25:08 -04:00
Linus Torvalds
501b7c77de Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()
  ocfs2: clean up some osb fields
  ocfs2: fix init of uuid_net_key
  ocfs2: silence a debug print
  ocfs2: silence ENOENT during lookup of broken links
  ocfs2: Cleanup message prints
  ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup
  [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static
  ocfs2: warn the user on a dead timeout mismatch
  ocfs2: OCFS2_FS must depend on SYSFS
  ocfs2: Compile-time disabling of ocfs2 debugging output.
  configfs: Clear up a few extra spaces where there should be TABs.
  configfs: Release memory in configfs_example.
2006-06-29 17:44:21 -07:00
Joel Becker
22dd0e88b7 configfs: Release memory in configfs_example.
The configfs_example module was missing a ->release().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29 14:38:53 -07:00
Greg Kroah-Hartman
5c3927dc34 [PATCH] devfs: Remove devfs documentation from the kernel tree
Removes the Documentaiton/filesystems/devfs/ directory

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-26 12:25:05 -07:00
Badari Pulavarty
ade1a29e16 [PATCH] ext3: Add "-o bh" option
This patch adds "-o bh" option to force use of buffer_heads.  This option
is needed when we make "nobh" as default - and if we run into problems.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26 09:58:20 -07:00
Linus Torvalds
1d77062b14 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
  nfs: remove nfs_put_link()
  nfs-build-fix-99
  git-nfs-build-fixes
  Merge branch 'odirect'
  NFS: alloc nfs_read/write_data as direct I/O is scheduled
  NFS: Eliminate nfs_get_user_pages()
  NFS: refactor nfs_direct_free_user_pages
  NFS: remove user_addr, user_count, and pos from nfs_direct_req
  NFS: "open code" the NFS direct write rescheduler
  NFS: Separate functions for counting outstanding NFS direct I/Os
  NLM: Fix reclaim races
  NLM: sem to mutex conversion
  locks.c: add the fl_owner to nlm_compare_locks
  NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
  NFS: Split fs/nfs/inode.c
  NFS: Fix typo in nfs_do_clone_mount()
  NFS: Fix compile errors introduced by referrals patches
  NFSv4: Ensure that referral mounts bind to a reserved port
  NFSv4: A root pathname is sent as a zero component4
  NFSv4: Follow a referral
  ...
2006-06-25 10:54:14 -07:00
Rob Landley
e7b6905582 [PATCH] Initramfs docs update
New section on creating an external initramfs image using cpio (with
script), a warning about bad advice in the cpio man page, a bit of
debugging advice (hello world and rdinit=/bin/sh), and a few minor tweaks
to other parts of it.

Signed-off-by: Rob Landley <rob@landley.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:21 -07:00
Miklos Szeredi
a4d27e75ff [PATCH] fuse: add request interruption
Add synchronous request interruption.  This is needed for file locking
operations which have to be interruptible.  However filesystem may implement
interruptibility of other operations (e.g.  like NFS 'intr' mount option).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
bafa96541b [PATCH] fuse: add control filesystem
Add a control filesystem to fuse, replacing the attributes currently exported
through sysfs.  An empty directory '/sys/fs/fuse/connections' is still created
in sysfs, and mounting the control filesystem here provides backward
compatibility.

Advantages of the control filesystem over the previous solution:

  - allows the object directory and the attributes to be owned by the
    filesystem owner, hence letting unpriviled users abort the
    filesystem connection

  - does not suffer from module unload race

[akpm@osdl.org: fix this fs for recent dhowells depredations]
[akpm@osdl.org: fix 64-bit printk warnings]
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Miklos Szeredi
51eb01e735 [PATCH] fuse: no backgrounding on interrupt
Don't put requests into the background when a fatal interrupt occurs while the
request is in userspace.  This removes a major wart from the implementation.

Backgrounding of requests was introduced to allow breaking of deadlocks.
However now the same can be achieved by aborting the filesystem through the
'abort' sysfs attribute.

This is a change in the interface, but should not cause problems, since these
kinds of deadlocks never happen during normal operation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25 10:01:19 -07:00
Trond Myklebust
816724e65c Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	fs/nfs/inode.c
	fs/super.c

Fix conflicts between patch 'NFS: Split fs/nfs/inode.c' and patch
'VFS: Permit filesystem to override root dentry on mount'
2006-06-24 13:07:53 -04:00
David Howells
726c334223 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
David Howells
454e2398be [PATCH] VFS: Permit filesystem to override root dentry on mount
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.

The filesystem is then required to manually set the superblock and root dentry
pointers.  For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).

The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.

This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing.  In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.

The patch also makes the following changes:

 (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
     pointer argument and return an integer, so most filesystems have to change
     very little.

 (*) If one of the convenience function is not used, then get_sb() should
     normally call simple_set_mnt() to instantiate the vfsmount. This will
     always return 0, and so can be tail-called from get_sb().

 (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
     dcache upon superblock destruction rather than shrink_dcache_anon().

     This is required because the superblock may now have multiple trees that
     aren't actually bound to s_root, but that still need to be cleaned up. The
     currently called functions assume that the whole tree is rooted at s_root,
     and that anonymous dentries are not the roots of trees which results in
     dentries being left unculled.

     However, with the way NFS superblock sharing are currently set to be
     implemented, these assumptions are violated: the root of the filesystem is
     simply a dummy dentry and inode (the real inode for '/' may well be
     inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
     with child trees.

     [*] Anonymous until discovered from another tree.

 (*) The documentation has been adjusted, including the additional bit of
     changing ext2_* into foo_* in the documentation.

[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:45 -07:00
Trond Myklebust
70ac4385a1 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	include/linux/nfs_fs.h

Fixed up conflict with kernel header updates.
2006-06-20 20:46:21 -04:00
Amy Griffis
0edce197db [PATCH] inotify (5/5): update kernel documentation
Update kernel documentation to include a description of the inotify
kernel API.

Signed-off-by: Amy Griffis <amy.griffis@hp.com>
Acked-by: Robert Love <rml@novell.com>
Acked-by: John McCutchan <john@johnmccutchan.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-06-20 05:25:19 -04:00
Trond Myklebust
1f5ce9e93a VFS: Unexport do_kern_mount() and clean up simple_pin_fs()
Replace all module uses with the new vfs_kern_mount() interface, and fix up
simple_pin_fs().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-09 09:34:16 -04:00
David Teigland
d26046bb0a Merge branch 'master' 2006-04-27 11:49:55 -04:00
Miklos Szeredi
c86d90df26 [doc] add paragraph about 'fs' subsystem to sysfs.txt
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
2006-04-26 10:49:26 +02:00
David Teigland
373b5a4532 [GFS2] Update documentation
Updated information on the GFS2 tools

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-04-25 15:44:04 -04:00
Steven Whitehouse
a748422ee4 Merge branch 'master' 2006-04-21 12:52:36 -04:00
Pekka J Enberg
d1195c516a [PATCH] vfs: add splice_write and splice_read to documentation
This patch adds the new splice_write and splice_read file operations to
Documentation/filesystems/vfs.txt.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 14:21:59 +02:00
Steven Whitehouse
86579dd06d Merge branch 'master' 2006-03-31 15:34:58 -05:00
Jesper Juhl
50fc9999ec [PATCH] Docs update: missing files and descriptions for filesystems/00-INDEX
Add missing files and descriptions to
 Documentation/filesystems/00-INDEX

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 09:16:06 -08:00
Linus Torvalds
1e8c573933 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits)
  BUG_ON() Conversion in drivers/video/
  BUG_ON() Conversion in drivers/parisc/
  BUG_ON() Conversion in drivers/block/
  BUG_ON() Conversion in sound/sparc/cs4231.c
  BUG_ON() Conversion in drivers/s390/block/dasd.c
  BUG_ON() Conversion in lib/swiotlb.c
  BUG_ON() Conversion in kernel/cpu.c
  BUG_ON() Conversion in ipc/msg.c
  BUG_ON() Conversion in block/elevator.c
  BUG_ON() Conversion in fs/coda/
  BUG_ON() Conversion in fs/binfmt_elf_fdpic.c
  BUG_ON() Conversion in input/serio/hil_mlc.c
  BUG_ON() Conversion in md/dm-hw-handler.c
  BUG_ON() Conversion in md/bitmap.c
  The comment describing how MS_ASYNC works in msync.c is confusing
  rcu: undeclared variable used in documentation
  fix typos "wich" -> "which"
  typo patch for fs/ufs/super.c
  Fix simple typos
  tabify drivers/char/Makefile
  ...
2006-03-25 08:41:09 -08:00
NeilBrown
a9e102b60c [PATCH] More corrections to vfs.txt update
Thanks "Randy.Dunlap" <rdunlap@xenotime.net>

Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:02 -08:00
Phillip Susi
0e6b3e5e97 [PATCH] udf: fix uid/gid options and add uid/gid=ignore and forget options
As Pekka Enberg pointed out, with the if still following the else, you can
still get a null uid written to the disk if you specify a default uid= without
uid=forget.  In other words, if the desktop user is uid=1000 and the mount
option uid=1000 is given ( which is done on ubuntu automatically and probably
other distributions that use hal ), then if any other user besides uid 1000
owns a file then a 0 will be written to the media as the owning uid instead.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:00 -08:00
NeilBrown
341546f5ad [PATCH] Update some VFS documentation
Flesh out the description of the address_space operations.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Avishay Traeger <atraeger@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:57 -08:00
Eric Van Hensbergen
ed1f559b9b [PATCH] 9p: update documentation
Fix documentation to match current implementation.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Eric Van Hensbergen
67543e508d [PATCH] 9p: fix name consistency problems
There were a number of conflicting naming schemes used in the v9fs project.
The directory was fs/9p, but MAINTAINERS and Documentation referred to
v9fs.  The module name itself was 9p2000, and the file system type was 9P.
This patch attempts to clean that up, changing all references to 9p in
order to match the directory name.  We'll also start using 9p instead of
v9fs as our patch prefix.

There is also a minor consistency cleanup in the options changing the name
option to uname in order to more closely match the Plan 9 options.

Signed-off-by: Eric Van Hensbergevan <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Uwe Zeisberger
c30fe7f731 fix typos "wich" -> "which"
Signed-off-by: Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-24 18:23:14 +01:00
Anton Altaparmakov
e750d1c7cc NTFS: 2.1.27 - Various bug fixes and cleanups.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-03-23 17:04:12 +00:00
Alexey Dobriyan
4de151d8cd It's UTF-8
Fix some comments to "UTF-8".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-22 00:13:35 +01:00
Steven Whitehouse
f3b270a478 Merge branch 'master' 2006-02-27 13:27:34 -05:00
Linus Torvalds
2cb5b6beef Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs-2.6 2006-02-24 14:36:42 -08:00
Hugh Dickins
ad329b1519 [PATCH] tmpfs: recommend remount for mpol
akpm points out that switching to a non-NUMA kernel could be irritating
if mounting tmpfs fails on an mpol option: tmpfs.txt recommend remount.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-24 14:31:39 -08:00
Steven Whitehouse
2fcb4a1278 [GFS2] Update documentation
Change from gfs2_mkfs to mkfs -t gfs2 in the documentation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-02-24 10:42:20 -05:00
Anton Altaparmakov
1cf3109ffb NTFS: Do more detailed reporting of why we cannot mount read-write by
special casing the VOLUME_MODIFIED_BY_CHKDSK flag.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2006-02-24 10:48:14 +00:00
Steven Whitehouse
d35462b4bb Merge branch 'master' 2006-02-23 09:49:43 +00:00
Hugh Dickins
b00dc3ad74 [PATCH] tmpfs: fix mount mpol nodelist parsing
I've been dissatisfied with the mpol_nodelist mount option which was
added to tmpfs earlier in -rc.  Replace it by mpol=policy:nodelist.

And it was broken: a nodelist is a comma-separated list of numbers and
ranges; the mount options are a comma-separated list of token=values.
Whoops, blindly strsep'ing on commas doesn't work so well: since we've
no numeric tokens, and unlikely to add them, use that to distinguish.

Move the mpol= parsing to shmem_parse_mpol under CONFIG_NUMA, reject
all its options as invalid if not NUMA.  /proc shows MPOL_PREFERRED
as "prefer", so use that name for the policy instead of "preferred".

Enforce that mpol=default has no nodelist; that mpol=prefer has one
node only; that mpol=bind has a nodelist; but let mpol=interleave use
node_online_map if no nodelist given.  Describe this in tmpfs.txt.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Robin Holt <holt@sgi.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-21 17:10:15 -08:00
Eric Van Hensbergen
e1c9211755 [PATCH] v9fs: update documentation and fix debug flag
Minor updates to the documentation to bring them into sync with current
websites and available features.  The debug flag was switched back to hex
to match the documentation.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:00:11 -08:00
Joel Becker
3d0f89bb16 configfs: Add permission and ownership to configfs objects.
configfs always made item and attribute ownership root.root and
permissions based on a umask of 022.  Add ->setattr() to allow
chown(2)/chmod(2), and persist the changes for the lifetime of the
items and attributes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-03 14:01:05 -08:00
J. Bruce Fields
0d419a6a95 [OCFS2] Documentation Fix
Update ocfs2.txt to add "cluster aware lockf" under missing features.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-03 13:47:19 -08:00
David Teigland
e473142070 [GFS2] Add documentation for GFS2
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
2006-01-18 09:21:38 +00:00
Miklos Szeredi
bacac382fb [PATCH] fuse: update documentation for sysfs
Add documentation for new attributes in sysfs.  Also describe the filesystem.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16 23:15:31 -08:00
Robin Holt
7339ff8302 [PATCH] Add tmpfs options for memory placement policies
Anything that writes into a tmpfs filesystem is liable to disproportionately
decrease the available memory on a particular node.  Since there's no telling
what sort of application (e.g.  dd/cp/cat) might be dropping large files
there, this lets the admin choose the appropriate default behavior for their
site's situation.

Introduce a tmpfs mount option which allows specifying a memory policy and
a second option to specify the nodelist for that policy.  With the default
policy, tmpfs will behave as it does today.  This patch adds support for
preferred, bind, and interleave policies.

The default policy will cause pages to be added to tmpfs files on the node
which is doing the writing.  Some jobs expect a single process to create
and manage the tmpfs files.  This results in a node which has a
significantly reduced number of free pages.

With this patch, the administrator can specify the policy and nodes for
that policy where they would prefer allocations.

This patch was originally written by Brent Casavant and Hugh Dickins.  I
added support for the bind and preferred policies and the mpol_nodelist
mount option.

Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:07 -08:00
Tore Anderson
e56d5ae305 [PATCH] ext3: fix documentation of online resizing
Undocument the non-working resize= mount option in ext3, and add some
references to the ext2resize package instead, which appears to be the only
proper way of doing online resizing of ext3 filesystems.

Signed-off-by: Tore Anderson <tore@fud.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:10 -08:00
Jesper Juhl
c63ca3c8b0 [PATCH] Docs update: small spelling, formating etc fixes for filesystems/ext3.txt
Spelling fixes, formating changes and corrections for
 Documentation/filesystems/ext3.txt

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:54 -08:00
Linus Torvalds
977127174a Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 2006-01-09 18:41:42 -08:00
Adrian Bunk
e82443c092 Documentation/filesystems/proc.txt: indentation fix
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-10 00:20:30 +01:00
Jesse Barnes
5d135dff53 [PATCH] PCI: document sysfs rom file interface
idr gently pointed out today that not only is the sysfs rom file
interface somewhat unintuitive (despite my efforts and initial
implementation), but it's also undocumented!  This patch to
Documentation/filesystems/sysfs-pci.txt corrects the latter problem; the
former is a userland ABI now though, so we're stuck with it for awhile
at least.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-09 12:13:19 -08:00
Linus Torvalds
6150c32589 Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge 2006-01-09 10:03:44 -08:00
Rob Landley
99aef427e2 [PATCH] update to the initramfs docs
Based on questions people have asked me.  Repeatedly.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:00 -08:00
Johann Lombardi
71b9625744 [PATCH] ext3: external journal device as a mount option
The patch below adds a new mount option to allow the external journal
device to be specified.

The syntax is as follows:
# mount -t ext3 -o journal_dev=0x0820 ...
where 0x0820 means major=8 and minor=32.

Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:56 -08:00
Tom Zanussi
6b34350f49 [PATCH] relayfs: Documentation cleanup, remove obsolete info
librelay and relay-app.h have been retired - update Documentation to reflect
that.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:51 -08:00
Tom Zanussi
df49af8f33 [PATCH] relayfs: add Documentation on global relay buffers
Documentation update for creating global buffers.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:50 -08:00
Tom Zanussi
03d78d11d9 [PATCH] relayfs: add Documentation on relay files in other filesystems
Documentation update for creating relay files in other filesystems.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:50 -08:00
Tom Zanussi
925ac8a2b6 [PATCH] relayfs: add Documention for non-relay files
Documentation update for non-relay files.

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:50 -08:00
Andrew Morton
9d0243bca3 [PATCH] drop-pagecache
Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can.  THis
operation requires root permissions.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
   so the discarding can be controlled on a per-node basis.

This is a debugging feature: useful for getting consistent results between
filesystem benchmarks.  We could possibly put it under a config option, but
it's less than 300 bytes.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:40 -08:00
Arnd Bergmann
67207b9664 [PATCH] spufs: The SPU file system, base
This is the current version of the spu file system, used
for driving SPEs on the Cell Broadband Engine.

This release is almost identical to the version for the
2.6.14 kernel posted earlier, which is available as part
of the Cell BE Linux distribution from
http://www.bsc.es/projects/deepcomputing/linuxoncell/.

The first patch provides all the interfaces for running
spu application, but does not have any support for
debugging SPU tasks or for scheduling. Both these
functionalities are added in the subsequent patches.

See Documentation/filesystems/spufs.txt on how to use
spufs.

Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-01-09 14:49:12 +11:00
Linus Torvalds
29552b1462 Merge http://oss.oracle.com/git/ocfs2 2006-01-05 20:43:11 -08:00
Mark Fasheh
ccd979bdbc [PATCH] OCFS2: The Second Oracle Cluster Filesystem
The OCFS2 file system module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:47 -08:00
Mark Fasheh
8df08c89c6 [PATCH] OCFS2: The Second Oracle Cluster Filesystem
dlmfs: A minimal dlm userspace interface implemented via a virtual
file system.
Most of the OCFS2 tools make use of this to take cluster locks when
doing operations on the file system.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
2006-01-03 11:45:47 -08:00
Joel Becker
7063fbf226 [PATCH] configfs: User-driven configuration filesystem
Configfs, a file system for userspace-driven kernel object configuration.
The OCFS2 stack makes extensive use of this for propagation of cluster
configuration information into kernel.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2006-01-03 11:45:28 -08:00
Paolo 'Blaisorblade' Giarrusso
48d727a9f9 Documentation/filesystems/00-INDEX: remove entry for fat_cvf.txt
Remove non-existing entry for fat_cvf.txt (was it ever supported?).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03 13:44:23 +01:00
Jim Cromie
e3e1bfe4f2 Documentation/filesystems/vfs.txt: typo fix
This patch removes an extra occurrence of 'generic'.

Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03 13:35:41 +01:00
Andreas Gruenbacher
85b8724249 [PATCH] ext3: fix mount options documentation
Reported by Jacques de Mer and Daniel Drake <dsd@gentoo.org>.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 08:57:42 -08:00
Randy Dunlap
98766fbe60 [PATCH] kernel Doc/ URL corrections
Correct lots of URLs in Documentation/ Also a few minor whitespace cleanups
and typo/spello fixes.  Sadly there are still a lot of bad URLs remaining.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:30 -08:00
Adrian Bunk
2860b733f1 [PATCH] remove CONFIG_EXT{2,3}_CHECK
The CONFIG_EXT{2,3}_CHECK options where were never available, and all they
did was to implement a subset of e2fsck in the kernel.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:55:58 -08:00
Rob Landley
7f46a240b0 [PATCH] ramfs, rootfs, and initramfs docs
Docs for ramfs, rootfs, and initramfs.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:56 -08:00
Pekka Enberg
cbf8f0f36a [PATCH] VFS: split dentry locking documentation
This patch splits dentry locking documentation from
Documentation/filesystems/vfs.txt to a separate file.  The dentry locking
bits are useful but do not fit into the VFS overview document as is.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:56 -08:00
Pekka Enberg
cc7d1f8f96 [PATCH] VFS: update overview document
This patch updates the Documentation/filesystems/vfs.txt document.  I
rearranged and rewrote parts of the introduction chapter and added better
headings for each section.  I also added a description for the inode
rename() operation which was missing and added links to some useful
external VFS documentation.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:55 -08:00
Jesper Juhl
62a07e6e9e [PATCH] ksymoops related docs update
Update ksymoops related documentation to reflect current 2.6 reality.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:54 -08:00
Nathan Scott
fc97bbf35d [XFS] Update XFS documentation.
Signed-off-by: Nathan Scott <nathans@sgi.com>
2005-11-03 13:46:43 +11:00
Anton Altaparmakov
98b270362b NTFS: The big ntfs write(2) rewrite has arrived. We now implement our own
file operations ->write(), ->aio_write(), and ->writev() for regular
      files.  This replaces the old use of generic_file_write(), et al and
      the address space operations ->prepare_write and ->commit_write.
      This means that both sparse and non-sparse (unencrypted and
      uncompressed) files can now be extended using the normal write(2)
      code path.  There are two limitations at present and these are that
      we never create sparse files and that we only have limited support
      for highly fragmented files, i.e. ones whose data attribute is split
      across multiple extents.   When such a case is encountered,
      EOPNOTSUPP is returned.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-10-11 15:40:40 +01:00
Marcelo Tosatti
afeda2c24e [PATCH] relayfs documentation typo
Small typo in relayfs documentation.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:01 -07:00
Miklos Szeredi
45323fb764 [PATCH] fuse: more flexible caching
Make data caching behavior selectable on a per-open basis instead of
per-mount.  Compatibility for the old mount options 'kernel_cache' and
'direct_io' is retained in the userspace library (version 2.4.0-pre1 or
later).

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:47 -07:00
Miklos Szeredi
334f485df8 [PATCH] FUSE - device functions
This adds the FUSE device handling functions.

This contains the following files:

 o dev.c
    - fuse device operations (read, write, release, poll)
    - registers misc device
    - support for sending requests to userspace

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:44 -07:00
Pekka J Enberg
5ea626aac1 [PATCH] VFS: update documentation
This patch brings the now out-of-date Documentation/filesystems/vfs.txt
back to life.  Thanks to Carsten Otte, Trond Myklebust, and Anton
Altaparmakov for their help on updating this documentation.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:43 -07:00
Chuck Ebbert
af97c7220a [PATCH] docs: fix misinformation about overcommit_memory
Someone complained about the docs for vm_overcommit_memory being wrong.
This patch copies the text from the vm documentation into procfs.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 14:03:43 -07:00
Eric Van Hensbergen
93fa58cb83 [PATCH] v9fs: Documentation, Makefiles, Configuration
OVERVIEW

V9FS is a distributed file system for Linux which provides an
implementation of the Plan 9 resource sharing protocol 9P.  It can be
used to share all sorts of resources: static files, synthetic file servers
(such as /proc or /sys), devices, and application file servers (such as
FUSE).

BACKGROUND

Plan 9 (http://plan9.bell-labs.com/plan9) is a research operating
system and associated applications suite developed by the Computing
Science Research Center of AT&T Bell Laboratories (now a part of
Lucent Technologies), the same group that developed UNIX , C, and C++.
Plan 9 was initially released in 1993 to universities, and then made
generally available in 1995. Its core operating systems code laid the
foundation for the Inferno Operating System released as a product by
Lucent Bell-Labs in 1997. The Inferno venture was the only commercial
embodiment of Plan 9 and is currently maintained as a product by Vita
Nuova (http://www.vitanuova.com). After updated releases in 2000 and
2002, Plan 9 was open-sourced under the OSI approved Lucent Public
License in 2003.

The Plan 9 project was started by Ken Thompson and Rob Pike in 1985.
Their intent was to explore potential solutions to some of the
shortcomings of UNIX in the face of the widespread use of high-speed
networks to connect machines. In UNIX, networking was an afterthought
and UNIX clusters became little more than a network of stand-alone
systems. Plan 9 was designed from first principles as a seamless
distributed system with integrated secure network resource sharing.
Applications and services were architected in such a way as to allow
for implicit distribution across a cluster of systems. Configuring an
environment to use remote application components or services in place
of their local equivalent could be achieved with a few simple command
line instructions. For the most part, application implementations
operated independent of the location of their actual resources.

Commercial operating systems haven't changed much in the 20 years
since Plan 9 was conceived. Network and distributed systems support is
provided by a patchwork of middle-ware, with an endless number of
packages supplying pieces of the puzzle. Matters are complicated by
the use of different complicated protocols for individual services,
and separate implementations for kernel and application resources.
The V9FS project (http://v9fs.sourceforge.net) is an attempt to bring
Plan 9's unified approach to resource sharing to Linux and other
operating systems via support for the 9P2000 resource sharing
protocol.

V9FS HISTORY

V9FS was originally developed by Ron Minnich and Maya Gokhale at Los
Alamos National Labs (LANL) in 1997.  In November of 2001, Greg Watson
setup a SourceForge project as a public repository for the code which
supported the Linux 2.4 kernel.

About a year ago, I picked up the initial attempt Ron Minnich had
made to provide 2.6 support and got the code integrated into a 2.6.5
kernel.   I then went through a line-for-line re-write attempting to
clean-up the code while more closely following the Linux Kernel style
guidelines.  I co-authored a paper with Ron Minnich on the V9FS Linux
support including performance comparisons to NFSv3 using Bonnie and
PostMark - this paper appeared at the USENIX/FREENIX 2005
conference in April 2005:
( http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html ).

CALL FOR PARTICIPATION/REQUEST FOR COMMENTS

Our 2.6 kernel support is stabilizing and we'd like to begin pursuing
its integration into the official kernel tree.  We would appreciate any
review, comments, critiques, and additions from this community and are
actively seeking people to join our project and help us produce
something that would be acceptable and useful to the Linux community.

STATUS

The code is reasonably stable, although there are no doubt corner cases
our regression tests haven't discovered yet.  It is in regular use by several
of the developers and has been tested on x86 and PowerPC
(32-bit and 64-bit) in both small and large (LANL cluster) deployments.
Our current regression tests include fsx, bonnie, and postmark.

It was our intention to keep things as simple as possible for this
release -- trying to focus on correctness within the core of the
protocol support versus a rich set of features.  For example: a more
complete security model and cache layer are in the road map, but
excluded from this release.   Additionally, we have removed support for
mmap operations at Al Viro's request.

PERFORMANCE

Detailed performance numbers and analysis are included in the FREENIX
paper, but we show comparable performance to NFSv3 for large file
operations based on the Bonnie benchmark, and superior performance for
many small file operations based on the PostMark benchmark.   Somewhat
preliminary graphs (from the FREENIX paper) are available
(http://v9fs.sourceforge.net/perf/index.html).

RESOURCES

The source code is available in a few different forms:

tarballs: http://v9fs.sf.net
CVSweb: http://cvs.sourceforge.net/viewcvs.py/v9fs/linux-9p/
CVS: :pserver:anonymous@cvs.sourceforge.net:/cvsroot/v9fs/linux-9p
Git: rsync://v9fs.graverobber.org/v9fs (webgit: http://v9fs.graverobber.org)
9P: tcp!v9fs.graverobber.org!6564

The user-level server is available from either the Plan 9 distribution
or from http://v9fs.sf.net
Other support applications are still being developed, but preliminary
version can be downloaded from sourceforge.

Documentation on the protocol has historically been the Plan 9 Man
pages (http://plan9.bell-labs.com/sys/man/5/INDEX.html), but there is
an effort under way to write a more complete Internet-Draft style
specification (http://v9fs.sf.net/rfc).

There are a couple of mailing lists supporting v9fs, but the most used
is v9fs-developer@lists.sourceforge.net -- please direct/cc your
comments there so the other v9fs contibutors can participate in the
conversation.  There is also an IRC channel: irc://freenode.net/#v9fs

This part of the patch contains Documentation, Makefiles, and configuration
file changes.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:56 -07:00
Dipankar Sarma
2822541893 [PATCH] files: files locking doc
Add documentation describing the new locking scheme for file descriptor table.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 13:57:55 -07:00
Anton Altaparmakov
7d333d6c73 NTFS: 2.1.24 release and some minor final fixes.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-09-08 23:01:16 +01:00
Tom Zanussi
e82894f84d [PATCH] relayfs
Here's the latest version of relayfs, against linux-2.6.11-mm2.  I'm hoping
you'll consider putting this version back into your tree - the previous
rounds of comment seem to have shaken out all the API issues and the number
of comments on the code itself have also steadily dwindled.

This patch is essentially the same as the relayfs redux part 5 patch, with
some minor changes based on reviewer comments.  Thanks again to Pekka
Enberg for those.  The patch size without documentation is now a little
smaller at just over 40k.  Here's a detailed list of the changes:

- removed the attribute_flags in relay open and changed it to a
  boolean specifying either overwrite or no-overwrite mode, and removed
  everything referencing the attribute flags.
- added a check for NULL names in relayfs_create_entry()
- got rid of the unnecessary multiple labels in relay_create_buf()
- some minor simplification of relay_alloc_buf() which got rid of a
  couple params
- updated the Documentation

In addition, this version (through code contained in the relay-apps tarball
linked to below, not as part of the relayfs patch) tries to make it as easy
as possible to create the cooperating kernel/user pieces of a typical and
common type of logging application, one where kernel logging is kicked off
when a user space data collection app starts and stops when the collection
app exits, with the data being automatically logged to disk in between.  To
create this type of application, you basically just include a header file
(relay-app.h, included in the relay-apps tarball) in your kernel module,
define a couple of callbacks and call an initialization function, and on
the user side call a single function that sets up and continuously monitors
the buffers, and writes data to files as it becomes available.  Channels
are created when the collection app is started and destroyed when it exits,
not when the kernel module is inserted, so different channel buffer sizes
can be specified for each separate run via command-line options.  See the
README in the relay-apps tarball for details.

Also included in the relay-apps tarball are a couple examples
demonstrating how you can use this to create quick and dirty kernel
logging/debugging applications.  They are:

- tprintk, short for 'tee printk', which temporarily puts a kprobe on
  printk() and writes a duplicate stream of printk output to a relayfs
  channel.  This could be used anywhere there's printk() debugging code
  in the kernel which you'd like to exercise, but would rather not have
  your system logs cluttered with debugging junk.  You'd probably want
  to kill klogd while you do this, otherwise there wouldn't be much
  point (since putting a kprobe on printk() doesn't change the output
  of printk()).  I've used this method to temporarily divert the packet
  logging output of the iptables LOG target from the system logs to
  relayfs files instead, for instance.

- klog, which just provides a printk-like formatted logging function
  on top of relayfs.  Again, you can use this to keep stuff out of your
  system logs if used in place of printk.

The example applications can be found here:

http://prdownloads.sourceforge.net/dprobes/relay-apps.tar.gz?download

From: Christoph Hellwig <hch@lst.de>

  avoid lookup_hash usage in relayfs

Signed-off-by: Tom Zanussi <zanussi@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:18 -07:00
Jan Veldeman
91e49001b9 [PATCH] Driver core: Documentation: use S_IRUSR | ... in stead of 0644
Change filemode to use defines in stead of 0644,
based on suggestions by Walter Harms and Domen Puncer.

Signed-off-by: Jan Veldeman <Jan.Veldeman@advalvas.be>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-05 16:03:12 -07:00
Jan Veldeman
f8d825bfb8 [PATCH] Driver core: Documentation: fix whitespace between parameters
Fix whitespace after comma between parameters.

Signed-off-by: Jan Veldeman <Jan.Veldeman@advalvas.be>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-05 16:03:12 -07:00
Mauricio Lin
e070ad49f3 [PATCH] add /proc/pid/smaps
Add a "smaps" entry to /proc/pid: show howmuch memory is resident in each
mapping.

People that want to perform a memory consumption analysing can use it
mainly if someone needs to figure out which libraries can be reduced for
embedded systems.  So the new features are the physical size of shared and
clean [or dirty]; private and clean [or dirty].

Take a look the example below:

# cat /proc/4576/smaps

08048000-080dc000 r-xp /bin/bash
Size:               592 KB
Rss:                500 KB
Shared_Clean:       500 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
080dc000-080e2000 rw-p /bin/bash
Size:                24 KB
Rss:                 24 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:       24 KB
080e2000-08116000 rw-p
Size:               208 KB
Rss:                208 KB
Shared_Clean:         0 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:      208 KB
b7e2b000-b7e34000 r-xp /lib/tls/libnss_files-2.3.2.so
Size:                36 KB
Rss:                 12 KB
Shared_Clean:        12 KB
Shared_Dirty:         0 KB
Private_Clean:        0 KB
Private_Dirty:        0 KB
...

(Includes a cleanup from "Richard Purdie" <rpurdie@rpsys.net>)

From: Torsten Foertsch <torsten.foertsch@gmx.net>

show_smap calls first show_map and then prints its additional information to
the seq_file.  show_map checks if all it has to print fits into the buffer and
if yes marks the current vma as written.  While that is correct for show_map
it is not for show_smap.  Here the vma should be marked as written only after
the additional information is also written.

The attached patch cures the problem.  It moves the functionality of the
show_map function to a new function show_map_internal that is called with an
additional struct mem_size_stats* argument.  Then show_map calls
show_map_internal with NULL as struct mem_size_stats* whereas show_smap calls
it with a real pointer.  Now the final

	if (m->count < m->size)  /* vma is copied successfully */
		m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;

is done only if the whole entry fits into the buffer.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05 00:05:49 -07:00
Linus Torvalds
af6ea9ca23 Merge master.kernel.org:/pub/scm/linux/kernel/git/aia21/ntfs-2.6 2005-07-16 11:47:51 -07:00
Robert Love
6f97933d0f [PATCH] inotify: documentation update
Clean up and expand some of the inotify documentation.

Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-15 09:54:51 -07:00
Anton Altaparmakov
c514720716 Automatic merge with /usr/src/ntfs-2.6.git. 2005-07-13 23:09:23 +01:00
Robert Love
0eeca28300 [PATCH] inotify
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:

        * dnotify requires the opening of one fd per each directory
          that you intend to watch. This quickly results in too many
          open files and pins removable media, preventing unmount.
        * dnotify is directory-based. You only learn about changes to
          directories. Sure, a change to a file in a directory affects
          the directory, but you are then forced to keep a cache of
          stat structures.
        * dnotify's interface to user-space is awful.  Signals?

inotify provides a more usable, simple, powerful solution to file change
notification:

        * inotify's interface is a system call that returns a fd, not SIGIO.
	  You get a single fd, which is select()-able.
        * inotify has an event that says "the filesystem that the item
          you were watching is on was unmounted."
        * inotify can watch directories or files.

Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.

See Documentation/filesystems/inotify.txt.

Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-12 20:38:38 -07:00
Anton Altaparmakov
ba6d2377c8 NTFS: Fix a nasty deadlock that appeared in recent kernels.
The situation: VFS inode X on a mounted ntfs volume is dirty.  For
      same inode X, the ntfs_inode is dirty and thus corresponding on-disk
      inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
      to the table of inodes, i.e. $MFT, inode 0.
      What happens:
      Process 1: sys_sync()/umount()/whatever...  calls
      __sync_single_inode() for $MFT -> do_writepages() -> write_page for
      the dirty page containing the on-disk inode X, the page is now locked
      -> ntfs_write_mst_block() which clears PageUptodate() on the page to
      prevent anyone else getting hold of it whilst it does the write out.
      This is necessary as the on-disk inode needs "fixups" applied before
      the write to disk which are removed again after the write and
      PageUptodate is then set again.  It then analyses the page looking
      for dirty on-disk inodes and when it finds one it calls
      ntfs_may_write_mft_record() to see if it is safe to write this
      on-disk inode.  This then calls ilookup5() to check if the
      corresponding VFS inode is in icache().  This in turn calls ifind()
      which waits on the inode lock via wait_on_inode whilst holding the
      global inode_lock.
      Process 2: pdflush results in a call to __sync_single_inode for the
      same VFS inode X on the ntfs volume.  This locks the inode (I_LOCK)
      then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
      read_cache_page() for the page (in page cache of table of inodes
      $MFT, inode 0) containing the on-disk inode.  This page has
      PageUptodate() clear because of Process 1 (see above) so
      read_cache_page() blocks when it tries to take the page lock for the
      page so it can call ntfs_read_page().
      Thus Process 1 is holding the page lock on the page containing the
      on-disk inode X and it is waiting on the inode X to be unlocked in
      ifind() so it can write the page out and then unlock the page.
      And Process 2 is holding the inode lock on inode X and is waiting for
      the page to be unlocked so it can call ntfs_readpage() or discover
      that Process 1 set PageUptodate() again and use the page.
      Thus we have a deadlock due to ifind() waiting on the inode lock.
      The solution: The fix is to use the newly introduced
      ilookup5_nowait() which does not wait on the inode's lock and hence
      avoids the deadlock.  This is safe as we do not care about the VFS
      inode and only use the fact that it is in the VFS inode cache and the
      fact that the vfs and ntfs inodes are one struct in memory to find
      the ntfs inode in memory if present.  Also, the ntfs inode has its
      own locking so it does not matter if the vfs inode is locked.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-06-26 22:12:02 +01:00
Anton Altaparmakov
af859a42d7 NTFS: Prepare for 2.1.23 release: Update documentation and bump version.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-06-25 21:07:27 +01:00
Anton Altaparmakov
38b22b6e9f Automerge with /usr/src/ntfs-2.6.git. 2005-06-25 14:27:27 +01:00
Carsten Otte
d763b7a473 [PATCH] xip: description
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 00:06:42 -07:00
Anton Altaparmakov
3357d4c75f Automatic merge with /usr/src/ntfs-2.6.git. 2005-06-23 11:26:22 +01:00
Jeremy White
9769f4eb3f [PATCH] isofs: show hidden files, add granularity for assoc/hidden files flags
The current isofs treatment of hidden files is flawed in two ways.  First,
it does not provide sufficient granularity; it hides both 'hidden' files
and 'associated' files (resource fork for Mac files).  Second, the default
behavior to completely strip hidden files, while an admirable
implementation of the spec, is a poor choice given the real world use of
hidden files as a poor mans copy protection scheme for MSDOS and Windows
based systems.  A longer description of this is available here:

   http://www.uwsg.iu.edu/hypermail/linux/kernel/0205.3/0267.html

This patch was originally built after a few private conversations with Alan
Cox; I shamefully failed to persist in seeing it go forward, I hope to make
amends now.

This patch introduces granularity by allowing explicit control for both
hidden and associated files.  It also reverses the default so that by
default, hidden files are treated as regular files on the iso9660 file
system.

This allow Wine to process Windows CDs, including those that are hybrid
Mac/Windows CDs properly and completely, without our having to go muck up
peoples fstabs as we do now.  (I have tested this with such a hybrid +
hidden CD and have verified that this patch works as claimed).

Signed-off-by: Jeremy White <jwhite@codeweavers.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 19:07:38 -07:00
Hugh Dickins
0edd73b334 [PATCH] shmem: restore superblock info
To improve shmem scalability, we allowed tmpfs instances which don't need
their blocks or inodes limited not to count them, and not to allocate any
sbinfo.  Which was okay when the only use for the sbinfo was accounting
blocks and inodes; but since then a couple of unrelated projects extending
tmpfs want to store other data in the sbinfo.  Whether either extension
reaches mainline is beside the point: I'm guilty of a bad design decision,
and should restore sbinfo to make any such future extensions easier.

So, once again allocate a shmem_sb_info for every shmem/tmpfs instance, and
now let max_blocks 0 indicate unlimited blocks, and max_inodes 0 unlimited
inodes.  Brent Casavant verified (many months ago) that this does not
perceptibly impact the scalability (since the unlimited sbinfo cacheline is
repeatedly accessed but only once dirtied).

And merge shmem_set_size into its sole caller shmem_remount_fs.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21 18:46:18 -07:00
Yani Ioannou
3eb8c7836e [PATCH] Driver core: Documentation: update device attribute callbacks
Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20 15:15:32 -07:00
Anton Altaparmakov
67394f8f06 Merge with /usr/src/ntfs-2.6.git 2005-05-21 22:00:02 +01:00
David Brownell
0b405a0f7e [PATCH] Driver Core: remove driver model detach_state
The driver model has a "detach_state" mechanism that:

 - Has never been used by any in-kernel drive;
 - Is superfluous, since driver remove() methods can do the same thing;
 - Became buggy when the suspend() parameter changed semantics and type;
 - Could self-deadlock when called from certain suspend contexts;
 - Is effectively wasted documentation, object code, and headspace.

This removes that "detach_state" mechanism; net code shrink, as well
as a per-device saving in the driver model and sysfs.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-05-17 14:54:55 -07:00
Anton Altaparmakov
c002f42543 NTFS: - Add disable_sparse mount option together with a per volume sparse
enable bit which is set appropriately and a per inode sparse disable
	bit which is preset on some system file inodes as appropriate.
      - Enforce that sparse support is disabled on NTFS volumes pre 3.0.

Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
2005-05-05 10:53:01 +01:00
Cosmin Nicolaescu
c31403a1f5 [PATCH] Documentation: remove super-{nr, max} to reflect fs/super.c
The patch updates the documentation for /proc.  super-nr and super-max have
been dropped from the kernel since 2.4.9 due to minor numbering issues.
This change was not documented in the documentation.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:59:28 -07:00
Nikita Danilov
2054606ad6 [PATCH] doc: Locking update
Make the Locking document truer.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01 08:58:37 -07:00
Linus Torvalds
1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00