Commit Graph

60021 Commits

Author SHA1 Message Date
Nicholas Bellinger
cbf031f425 target: Add support for EXTENDED_COPY copy offload emulation
This patch adds support for EXTENDED_COPY emulation from SPC-3, that
enables full copy offload target support within both a single virtual
backend device, and across multiple virtual backend devices.  It also
functions independent of target fabric, and supports copy offload
across multiple target fabric ports.

This implemenation supports both EXTENDED_COPY PUSH and PULL models
of operation, so the actual CDB may be received on either source or
desination logical unit.

For Target Descriptors, it currently supports the NAA IEEE Registered
Extended designator (type 0xe4), which allows the reference of target
ports to occur independent of fabric type using EVPD 0x83 WWNs.

For Segment Descriptors, it currently supports copy from block to
block (0x02) mode.

It also honors any present SCSI reservations of the destination target
port.  Note that only Supports No List Identifier (SNLID=1) mode is
supported.

Also included is basic RECEIVE_COPY_RESULTS with service action type
OPERATING PARAMETERS (0x03) required for SNLID=1 operation.

v3 changes:
  - Fix incorrect return type in target_do_receive_copy_results()
    (Fengguang)

v2 changes:
  - Use target_alloc_sgl() instead of transport_generic_get_mem()
  - Convert debug output to use pr_debug()
  - Convert target_xcopy_parse_target_descriptors() NAA IEEN WWN
    dump to use 0x%16phN format specification
  - Drop unnecessary xcopy_pt_cmd->xpt_passthrough_wsem, and
    associated usage in xcopy_pt_write_pending() and
    target_xcopy_issue_pt_cmd()
  - Add check for unsupported EXTENDED_COPY(LID4) service action
    bits in target_do_xcopy()

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:43 -07:00
Nicholas Bellinger
d9ea32bff2 target: Add global device list for EXTENDED_COPY
EXTENDED_COPY needs to be able to search a global list of devices
based on NAA WWN device identifiers, so add a simple g_device_list
protected by g_device_mutex.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:40 -07:00
Nicholas Bellinger
c5ff8d6bc3 target: Make helpers non static for EXTENDED_COPY command setup
Both target_alloc_sgl() and transport_generic_map_mem_to_cmd() are
required by EXTENDED_COPY logic when setting up internally dispatched
command descriptors, so go ahead and make both of these non static.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Zach Brown <zab@redhat.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:39 -07:00
Nicholas Bellinger
b3faa2e87c target/tcm_qla2xxx: Add/use target_reverse_dma_direction() in target_core_fabric.h
Reversing the dma_data_direction for pci_map_sg() friends is useful
for other drivers, so move it from tcm_qla2xxx into inline code
within target_core_fabric.h.

Also drop internal usage of equivlient in tcm_qla2xxx fabric code.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:48:35 -07:00
Nicholas Bellinger
68ff9b9b27 target: Add support for COMPARE_AND_WRITE emulation
This patch adds support for COMPARE_AND_WRITE emulation on a per block
basis.  This logic is used as an atomic test and set primative currently
used by VMWare ESX VAAI for performing array side locking of individual
VMFS extent ownership.

This includes the COMPARE_AND_WRITE CDB parsing within sbc_parse_cdb(),
and does the majority of the work within the compare_and_write_callback()
to perform the verify instance user data comparision, and subsequent
write instance user data I/O submission upon a successfull comparision.

The synchronization is enforced by se_device->caw_sem, that is obtained
before the initial READ I/O submission in sbc_compare_and_write().  The
mutex is then released upon MISCOMPARE in compare_and_write_callback(),
or upon WRITE instance user-data completion in compare_and_write_post().

The implementation currently assumes a single logical block (NoLB=1).

v4 changes:
 - Explicitly clear cmd->transport_complete_callback for two failure
   cases in sbc_compare_and_write() in order to avoid double unlock
   of ->caw_sem in compare_and_write_callback() (Dan Carpenter)

v3 changes:
 - Convert se_device->caw_mutex to ->caw_sem

v2 changes:
 - Set SCF_COMPARE_AND_WRITE and cmd->execute_cmd() to
   sbc_compare_and_write() during setup in sbc_parse_cdb()
 - Use sbc_compare_and_write() for initial READ submission with
   DMA_FROM_DEVICE
 - Reset cmd->execute_cmd() to sbc_execute_rw() for write instance
   user-data in compare_and_write_callback()
 - Drop SCF_BIDI command flag usage
 - Set TRANSPORT_PROCESSING + transport_state flags before write
   instance submission, and convert to __target_execute_cmd()
 - Prevent sbc_get_size() from being being called twice to
   generate incorrect size in sbc_parse_cdb()
 - Enforce se_device->caw_mutex synchronization between initial
   READ I/O submission, and final WRITE I/O completion.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-10 16:41:59 -07:00
Nicholas Bellinger
0123a9ec6a target: Add MAXIMUM COMPARE AND WRITE LENGTH in Block Limits VPD
This patch adds the MAXIMUM COMPARE AND WRITE LENGTH bit, currently
hardcoded to a single logical block (NoLB=1) within the Block Limits
VPD in spc_emulate_evpd_b0().

Also add emulate_caw device attribute in configfs (enabled by default)
to allow the exposure of this bit to be disabled, if necessary.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:35 -07:00
Nicholas Bellinger
76dde50ebe target: Make __target_execute_cmd() available as extern
Required by COMPARE_AND_WRITE for write instance user-data
submission, in order to bypass target_execute_cmd() checks.

Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:34 -07:00
Nicholas Bellinger
47e459e622 target: Add transport_reset_sgl_orig() for COMPARE_AND_WRITE
After COMPARE_AND_WRITE completes it's comparision, the WRITE
payload SGLs head expect to be updated to point from the verify
instance of user data, to the write instance of user data.

So for this special case, add transport_reset_sgl_orig() usage
within transport_free_pages() and add se_cmd->t_data_[sg,nents]_orig
members to save the original assignments.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:33 -07:00
Nicholas Bellinger
a82a9538dd target: Allow sbc_ops->execute_rw() to accept SGLs + data_direction
COMPARE_AND_WRITE expects to be able to send down a DMA_FROM_DEVICE
to obtain the necessary READ payload for comparision against the
first half of the WRITE payload containing the verify user data.

Currently virtual backends expect to internally reference SGLs,
SGL nents, and data_direction, so change IBLOCK, FILEIO and RD
sbc_ops->execute_rw() to accept this values as function parameters.

Also add default sbc_execute_rw() handler for the typical case for
cmd->execute_rw() submission using cmd->t_data_sg, cmd->t_data_nents,
and cmd->data_direction).

v2 Changes:
  - Add SCF_COMPARE_AND_WRITE command flag
  - Use sbc_execute_rw() for normal cmd->execute_rw() submission
    with expected se_cmd members.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:28 -07:00
Nicholas Bellinger
818b571ca0 target: Add TCM_MISCOMPARE_VERIFY sense handling
This patch adds TCM_MISCOMPARE_VERIFY (ASC=0x1d, ASCQ=0x00) sense
handling to transport_send_check_condition_and_sense(), which is
required for a COMPARE_AND_WRITE comparision failure.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:27 -07:00
Nicholas Bellinger
a6b0133c19 target: Add return for se_cmd->transport_complete_callback
This patch adds a sense_reason_t return to ->transport_complete_callback(),
and updates target_complete_ok_work() to invoke the call if necessary to
transport_send_check_condition_and_sense() during the failure case.

Also update xdreadwrite_callback() to use this return value.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:26 -07:00
Nicholas Bellinger
1c68cc1626 scsi: Add CDB definition for COMPARE_AND_WRITE
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin Petersen <martin.petersen@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:24 -07:00
Nicholas Bellinger
d703ce2f7f iscsi/iser-target: Convert to command priv_size usage
This command converts iscsi/isert-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd.

This includes removing iscsit_transport->alloc_cmd() usage, along
with updating isert-target code to use iscsit_priv_cmd().

Also, remove left-over iscsit_transport->release_cmd() usage for
direct calls to iscsit_release_cmd(), and drop the now unused
lio_cmd_cache and isert_cmd_cache.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:21 -07:00
Nicholas Bellinger
c0add7fd05 target: Add transport_init_session_tags using per-cpu ida
This patch adds lib/idr.c based transport_init_session_tags() logic
that allows fabric drivers to setup a per-cpu se_sess->sess_tag_pool
and associated se_sess->sess_cmd_map for basic tagged pre-allocation
of fabric descriptor sized memory.

v5 changes:
  - Convert to percpu_ida.h include

v4 changes:
  - Add transport_alloc_session_tags() for fabrics that need early
    transport_init_session()

v3 changes:
  - Update to percpu-ida usage

Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Asias He <asias@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:29:16 -07:00
Kent Overstreet
798ab48eec idr: Percpu ida
Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.

We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.

Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.

From akpm:

    "It looks OK to me (that's as close as I get to an ack :))

v6 changes:
  - Add #include <linux/cpumask.h> to include/linux/percpu_ida.h to
    make alpha/arc builds happy (Fengguang)
  - Move second (cpu >= nr_cpu_ids) check inside of first check scope
    in steal_tags() (akpm + nab)

v5 changes:
  - Change percpu_ida->cpus_have_tags to cpumask_t (kmo + akpm)
  - Add comment for percpu_ida_cpu->lock + ->nr_free (kmo + akpm)
  - Convert steal_tags() to use cpumask_weight() + cpumask_next() +
    cpumask_first() + cpumask_clear_cpu() (kmo + akpm)
  - Add comment for alloc_global_tags() (kmo + akpm)
  - Convert percpu_ida_alloc() to use cpumask_set_cpu() (kmo + akpm)
  - Convert percpu_ida_free() to use cpumask_set_cpu() (kmo + akpm)
  - Drop percpu_ida->cpus_have_tags allocation in percpu_ida_init()
    (kmo + akpm)
  - Drop percpu_ida->cpus_have_tags kfree in percpu_ida_destroy()
    (kmo + akpm)
  - Add comment for percpu_ida_alloc @ gfp (kmo + akpm)
  - Move to percpu_ida.c + percpu_ida.h (kmo + akpm + nab)

v4 changes:

  - Fix tags.c reference in percpu_ida_init (akpm)

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:29:15 -07:00
Linus Torvalds
b8ea0d06ff Merge tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:

 - Stable patch for lockd to fix Oopses due to inappropriate calls to
   utsname()->nodename

 - Stable patches for sunrpc to fix Oopses on shutdown when using
   AF_LOCAL sockets with rpcbind

 - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint

 - Fix a regression with the sync mount option failing to work for nfs4
   mounts

 - Fix a writeback performance issue when doing cache invalidation

 - Remove an incorrect call to nfs_setsecurity in nfs_fhget

* tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix up nfs4_proc_lookup_mountpoint
  NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
  NFSv4: Fix the sync mount option for nfs4 mounts
  NFS: Fix writeback performance issue on cache invalidation
  SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
  SUNRPC: Don't auto-disconnect from the local rpcbind socket
  LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs
2013-08-10 15:20:37 -07:00
Linus Torvalds
8ae3f1d095 Merge tag 'staging-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
 "Here are 3 small fixes for staging/IIO drivers for 3.11-rc5.  Nothing
  huge, two IIO driver fixes, and a zcache fix.  All of these have been
  in linux-next for a while"

* tag 'staging-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: zcache: fix "zcache=" kernel parameter
  iio: ti_am335x_adc: Fix wrong samples received on 1st read
  iio:trigger: Fix use_count race condition
2013-08-10 09:00:51 -07:00
Linus Torvalds
14e94194d1 Merge tag 'pm+acpi-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:

 - ACPI-based memory hotplug stopped working after a recent change,
   because it's not possible to associate sufficiently many "physical"
   devices with one ACPI device object due to an artificial limit.  Fix
   from Rafael J Wysocki removes that limit and makes memory hotplug
   work again.

 - A change made in 3.9 uncovered a bug in the ACPI processor driver
   preventing NUMA nodes from being put offline due to an ordering
   issue.  Fix from Yasuaki Ishimatsu changes the ordering to make
   things work again.

 - One of the recent ACPI video commits (that hasn't been reverted so
   far) uncovered a bug in the code handling quirky BIOSes that caused
   some Asus machines to boot with backlight completely off which made
   it quite difficult to use them afterward.  Fix from Felipe Contreras
   improves the quirk to cover this particular case correctly.

 - A cpufreq user space interface change made in 3.10 inadvertently
   renamed the ignore_nice_load sysfs attribute to ignore_nice which
   resulted in some confusion.  Fix from Viresh Kumar changes the name
   back to ignore_nice_load.

 - An initialization ordering change made in 3.9 broke cpufreq on
   loongson2 boards.  Fix from Aaro Koskinen restores the correct
   initialization ordering there.

 - Fix breakage resulting from a mistake made in 3.9 and causing the
   detection of some graphics adapters (that were detected correctly
   before) to fail.  There are two objects representing the same PCIe
   port in the affected systems' ACPI tables and both appear as
   "enabled" and we are expected to guess which one to use.  We used to
   choose the right one before by pure luck, but when we tried to
   address another similar corner case, the luck went away.  This time
   we try to make our guessing a bit more educated which is reported to
   work on those systems.

 - The /proc/acpi/wakeup interface code is missing some locking which
   may lead to breakage if that file is written or read during hotplug
   of wakeup devices.  That should be rare but still possible, so it's
   better to start using the appropriate locking there.

* tag 'pm+acpi-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Try harder to resolve _ADR collisions for bridges
  cpufreq: rename ignore_nice as ignore_nice_load
  cpufreq: loongson2: fix regression related to clock management
  ACPI / processor: move try_offline_node() after acpi_unmap_lsapic()
  ACPI: Drop physical_node_id_bitmap from struct acpi_device
  ACPI / PM: Walk physical_node_list under physical_node_lock
  ACPI / video: improve quirk check in acpi_video_bqc_quirk()
2013-08-09 15:07:19 -07:00
Linus Torvalds
79a6fb1ace Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "Some driver fixes (em28xx, coda, usbtv, s5p, hdpvr and ml86v7667) and
  a fix for media DocBook"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] em28xx: fix assignment of the eeprom data
  [media] hdpvr: fix iteration over uninitialized lists in hdpvr_probe()
  [media] usbtv: fix dependency
  [media] usbtv: Throw corrupted frames away
  [media] usbtv: Fix deinterlacing
  [media] v4l2: added missing mutex.h include to v4l2-ctrls.h
  [media] DocBook: upgrade media_api DocBook version to 4.2
  [media] ml86v7667: fix compile warning: 'ret' set but not used
  [media] s5p-g2d: Fix registration failure
  [media] media: coda: Fix DT driver data pointer for i.MX27
  [media] s5p-mfc: Fix input/output format reporting
2013-08-09 15:04:09 -07:00
Oleg Nesterov
8742f229b6 userns: limit the maximum depth of user_namespace->parent chain
Ensure that user_namespace->parent chain can't grow too much.
Currently we use the hardroded 32 as limit.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-08 13:11:39 -07:00
Linus Torvalds
d56290bbc1 Merge tag 'regmap-v3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
 "Two things here, one is a fix for a nasty issue where we were failing
  to sync the last register in a block when using raw writes and the
  other fixes a missing header for the !REGMAP stubs so that we don't
  rely on implicit includes in that case"

* tag 'regmap-v3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: Add missing header for !CONFIG_REGMAP stubs
  regmap: cache: Make sure to sync the last register in a block
2013-08-08 09:34:04 -07:00
Trond Myklebust
786615bc1c SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
If rpcbind causes our connection to the AF_LOCAL socket to close after
we've registered a service, then we want to be careful about reconnecting
since the mount namespace may have changed.

By simply refusing to reconnect the AF_LOCAL socket in the case of
unregister, we avoid the need to somehow save the mount namespace. While
this may lead to some services not unregistering properly, it should
be safe.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # 3.9.x
2013-08-07 17:07:18 -04:00
Rafael J. Wysocki
60f75b8e97 ACPI: Try harder to resolve _ADR collisions for bridges
In theory, under a given ACPI namespace node there should be only
one child device object with _ADR whose value matches a given bus
address exactly.  In practice, however, there are systems in which
multiple child device objects under a given parent have _ADR matching
exactly the same address.  In those cases we use _STA to determine
which of the multiple matching devices is enabled, since some systems
are known to indicate which ACPI device object to associate with the
given physical (usually PCI) device this way.

Unfortunately, as it turns out, there are systems in which many
device objects under the same parent have _ADR matching exactly the
same bus address and none of them has _STA, in which case they all
should be regarded as enabled according to the spec.  Still, if
those device objects are supposed to represent bridges (e.g. this
is the case for device objects corresponding to PCIe ports), we can
try harder and skip the ones that have no child device objects in the
ACPI namespace.  With luck, we can avoid using device objects that we
are not expected to use this way.

Although this only works for bridges whose children also have ACPI
namespace representation, it is sufficient to address graphics
adapter detection issues on some systems, so rework the code finding
a matching device ACPI handle for a given bus address to implement
this idea.

Introduce a new function, acpi_find_child(), taking three arguments:
the ACPI handle of the device's parent, a bus address suitable for
the device's bus type and a bool indicating if the device is a
bridge and make it work as outlined above.  Reimplement the function
currently used for this purpose, acpi_get_child(), as a call to
acpi_find_child() with the last argument set to 'false' and make
the PCI subsystem use acpi_find_child() with the bridge information
passed as the last argument to it.  [Lan Tianyu notices that it is
not sufficient to use pci_is_bridge() for that, because the device's
subordinate pointer hasn't been set yet at this point, so use
hdr_type instead.]

This change fixes a regression introduced inadvertently by commit
33f767d (ACPI: Rework acpi_get_child() to be more efficient) which
overlooked the fact that for acpi_walk_namespace() "post-order" means
"after all children have been visited" rather than "on the way back",
so for device objects without children and for namespace walks of
depth 1, as in the acpi_get_child() case, the "post-order" callbacks
ordering is actually the same as the ordering of "pre-order" ones.
Since that commit changed the namespace walk in acpi_get_child() to
terminate after finding the first matching object instead of going
through all of them and returning the last one, it effectively
changed the result returned by that function in some rare cases and
that led to problems (the switch from a "pre-order" to a "post-order"
callback was supposed to prevent that from happening, but it was
ineffective).

As it turns out, the systems where the change made by commit
33f767d actually matters are those where there are multiple ACPI
device objects representing the same PCIe port (which effectively
is a bridge).  Moreover, only one of them, and the one we are
expected to use, has child device objects in the ACPI namespace,
so the regression can be addressed as described above.

References: https://bugzilla.kernel.org/show_bug.cgi?id=60561
Reported-by: Peter Wu <lekensteyn@gmail.com>
Tested-by: Vladimir Lalov <mail@vlalov.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
2013-08-07 22:55:00 +02:00
Linus Torvalds
b7bc9e7d80 Merge tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
 "Oleg Nesterov has been working hard in closing all the holes that can
  lead to race conditions between deleting an event and accessing an
  event debugfs file.  This included a fix to the debugfs system (acked
  by Greg Kroah-Hartman).  We think that all the holes have been patched
  and hopefully we don't find more.  I haven't marked all of them for
  stable because I need to examine them more to figure out how far back
  some of the changes need to go.

  Along the way, some other fixes have been made.  Alexander Z Lam fixed
  some logic where the wrong buffer was being modifed.

  Andrew Vagin found a possible corruption for machines that actually
  allocate cpumask, as a reference to one was being zeroed out by
  mistake.

  Dhaval Giani found a bad prototype when tracing is not configured.

  And I not only had some changes to help Oleg, but also finally fixed a
  long standing bug that Dave Jones and others have been hitting, where
  a module unload and reload can cause the function tracing accounting
  to get screwed up"

* tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix reset of time stamps during trace_clock changes
  tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer
  tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set
  tracing: Fix fields of struct trace_iterator that are zeroed by mistake
  tracing/uprobes: Fail to unregister if probe event files are in use
  tracing/kprobes: Fail to unregister if probe event files are in use
  tracing: Add comment to describe special break case in probe_remove_event_call()
  tracing: trace_remove_event_call() should fail if call/file is in use
  debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)
  ftrace: Check module functions being traced on reload
  ftrace: Consolidate some duplicate code for updating ftrace ops
  tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private
  tracing: Introduce remove_event_file_dir()
  tracing: Change f_start() to take event_mutex and verify i_private != NULL
  tracing: Change event_filter_read/write to verify i_private != NULL
  tracing: Change event_enable/disable_read() to verify i_private != NULL
  tracing: Turn event/id->i_private into call->event.type
2013-08-07 13:01:30 -07:00
Mateusz Krawczuk
49ccc142f9 regmap: Add missing header for !CONFIG_REGMAP stubs
regmap.h requires linux/err.h if CONFIG_REGMAP is not defined. Without it I get
error.
CC      drivers/media/platform/exynos4-is/fimc-reg.o
In file included from drivers/media/platform/exynos4-is/fimc-reg.c:14:0:
include/linux/regmap.h: In function ‘regmap_write’:
include/linux/regmap.h:525:10: error: ‘EINVAL’ undeclared (first use in this function)
include/linux/regmap.h:525:10: note: each undeclared identifier is reported only once for each function it appears in

Signed-off-by: Mateusz Krawczuk <m.krawczuk@partner.samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@kernel.org
2013-08-06 19:49:46 +01:00
Rafael J. Wysocki
007ccfcf89 ACPI: Drop physical_node_id_bitmap from struct acpi_device
The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.  As a result, it fixes a
regression introduced by commit e2ff394 (ACPI / memhotplug: Bind
removable memory blocks to ACPI device nodes) that caused
acpi_memory_enable_device() to fail when the number of 128 MB blocks
within one removable memory module was greater than 32.

Reported-and-tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
2013-08-06 14:32:54 +02:00
Greg Kroah-Hartman
cefe8a32f2 Merge tag 'iio-fixes-for-3.11b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:

Second round of IIO fixes for the 3.11 cycle.

1) Fix a long term race in the IIO trigger handling.
   This only effects cases where a single trigger is in use
   by multiple devices.
2) ti_am335x fix an issue with incorrect data due to reading before
   the sequencer is finished.
2013-08-05 14:04:24 +08:00
Linus Torvalds
72a67a94bc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't ignore user initiated wireless regulatory settings on cards
    with custom regulatory domains, from Arik Nemtsov.

 2) Fix length check of bluetooth information responses, from Jaganath
    Kanakkassery.

 3) Fix misuse of PTR_ERR in btusb, from Adam Lee.

 4) Handle rfkill properly while iwlwifi devices are offline, from
    Emmanuel Grumbach.

 5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang.

 6) Kernel info leak in ATM packet scheduler, from Dan Carpenter.

 7) 8139cp doesn't check for DMA mapping errors, from Neil Horman.

 8) Fix bridge multicast code to not snoop when no querier exists,
    otherwise mutlicast traffic is lost.  From Linus Lüssing.

 9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek.

10) Fix races in automatic address asignment on ipv6, which can result
    in incorrect lifetime assignments.  From Jiri Benc.

11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename
    it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the
    original naming of this feature.  From Cong Wang.

12) Fix crash in TIPC when server socket creation fails, from Ying Xue.

13) macvlan_changelink() silently succeeds when it shouldn't, from
    Michael S Tsirkin.

14) HTB packet scheduler can crash due to sign extension, fix from
    Stephen Hemminger.

15) With the cable unplugged, r8169 prints out a message every 10
    seconds, make it netif_dbg() instead of netif_warn().  From Peter
    Wu.

16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann.

17) sis900 gets spurious TX queue timeouts due to mismanagement of link
    carrier state, from Denis Kirjanov.

18) Validate somaxconn sysctl to make sure it fits inside of a u16.
    From Roman Gushchin.

19) Fix MAC address filtering on qlcnic, from Shahed Shaikh.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
  qlcnic: Fix for flash update failure on 83xx adapter
  qlcnic: Fix link speed and duplex display for 83xx adapter
  qlcnic: Fix link speed display for 82xx adapter
  qlcnic: Fix external loopback test.
  qlcnic: Removed adapter series name from warning messages.
  qlcnic: Free up memory in error path.
  qlcnic: Fix ingress MAC learning
  qlcnic: Fix MAC address filter issue on 82xx adapter
  net: ethernet: davinci_emac: drop IRQF_DISABLED
  netlabel: use domain based selectors when address based selectors are not available
  net: check net.core.somaxconn sysctl values
  sis900: Fix the tx queue timeout issue
  net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails
  r8169: remove "PHY reset until link up" log spam
  net: ethernet: cpsw: drop IRQF_DISABLED
  htb: fix sign extension bug
  macvlan: handle set_promiscuity failures
  macvlan: better mode validation
  tipc: fix oops when creating server socket fails
  net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
  ...
2013-08-03 15:00:23 -07:00
Dhaval Giani
e67bc51e57 tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set
When CONFIG_TRACING is not enabled, the stub prototype for trace_dump_stack()
is incorrect. It has (void) when it should be (int).

Link: http://lkml.kernel.org/r/CAPhKKr_H=ukFnBL4WgDOVT5ay2xeF-Ho+CA0DWZX0E2JW-=vSQ@mail.gmail.com

Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-02 22:38:10 -04:00
Andrew Vagin
ed5467da0e tracing: Fix fields of struct trace_iterator that are zeroed by mistake
tracing_read_pipe zeros all fields bellow "seq". The declaration contains
a comment about that, but it doesn't help.

The first field is "snapshot", it's true when current open file is
snapshot. Looks obvious, that it should not be zeroed.

The second field is "started". It was converted from cpumask_t to
cpumask_var_t (v2.6.28-4983-g4462344), in other words it was
converted from cpumask to pointer on cpumask.

Currently the reference on "started" memory is lost after the first read
from tracing_read_pipe and a proper object will never be freed.

The "started" is never dereferenced for trace_pipe, because trace_pipe
can't have the TRACE_FILE_ANNOTATE options.

Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org

Cc: stable@vger.kernel.org # 2.6.30
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-08-02 22:28:41 -04:00
Linus Torvalds
abe0308070 Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma fixes from Roland Dreier:
 - Fixes for the newly merged mlx5 hardware driver
 - Stack info leak fixes from Dan Carpenter
 - Fixes for pkey table handling with SR-IOV
 - A few other small things

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IPoIB: Fix pkey change flow for virtualization environments
  IPoIB: Make sure child devices use valid/proper pkeys
  IB/core: Create QP1 using the pkey index which contains the default pkey
  mlx5_core: Variable may be used uninitialized
  mlx5_core: Implement new initialization sequence
  mlx5_core: Fix use after free in mlx5_cmd_comp_handler()
  IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()
  IB/mlx5: Fix error return code in init_one()
  IB/mlx4: Use default pkey when creating tunnel QPs
  RDMA/cma: Only call cma_save_ib_info() for CM REQs
  RDMA/cma: Fix accessing invalid private data for UD
  RDMA/cma: Fix gcc warning
  Revert "RDMA/nes: Fix compilation error when nes_debug is enabled"
  IB/qib: Add err_decode() call for ring dump
  RDMA/cxgb3: Fix stack info leak in iwch_create_cq()
  RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq()
  RDMA/ocrdma: Fix several stack info leaks
  RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()
  RDMA/ocrdma: Remove unused include
2013-08-02 14:58:30 -07:00
Linus Torvalds
1fe0135b9e Merge tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:

 - Revert two cpuidle commits added during the 3.8 development cycle
   that turn out to have introduced a significant performance regression
   as requested by Jeremy Eder.

 - The recent patches that made the freezer less heavy-weight introduced
   a regression causing user-space-driven hibernation using the ioctl()
   interface to block indefinitely when the hibernate process executes
   try_to_freeze().  Fix from Colin Cross addresses this by adding a
   process flag to mark the hibernate/suspend process to inform the
   freezer that that process should be ignored.

 - One of the recent cpufreq reverts uncovered a problem in the core
   causing the cpufreq driver module refcount to become negative after a
   system suspend-resume cycle.  Fix from Rafael J Wysocki.

 - The evaluation of the ACPI battery _BIX method has never worked
   correctly, because the commit that added support for it forgot to
   take the "Revision" field in the return package into account.  As a
   result, the reading of battery info doesn't work at all on some
   systems, which is addressed by a fix from Lan Tianyu.

* tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
  ACPI / battery: Fix parsing _BIX return value
  cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
  Revert "cpuidle: Quickly notice prediction failure for repeat mode"
  Revert "cpuidle: Quickly notice prediction failure in general case"
2013-08-02 12:21:32 -07:00
Cong Wang
e0d1095ae3 net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
Eliezer renames several *ll_poll to *busy_poll, but forgets
CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:11:17 -07:00
Cong Wang
dfcefb0be1 net: fix a compile error when CONFIG_NET_LL_RX_POLL is not set
When CONFIG_NET_LL_RX_POLL is not set, I got:

net/socket.c: In function ‘sock_poll’:
net/socket.c:1165:4: error: implicit declaration of function ‘sk_busy_loop’ [-Werror=implicit-function-declaration]

Fix this by adding a nop when !CONFIG_NET_LL_RX_POLL.

Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 15:10:58 -07:00
Michal Kubeček
2ac3ac8f86 ipv6: prevent fib6_run_gc() contention
On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01 14:16:20 -07:00
John W. Linville
22e02a0272 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-08-01 14:30:59 -04:00
Linus Torvalds
64ccccf852 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Radeon, nouveau, exynos, intel, mgag200..

  Not all strictly regressions but there was probably only one patch I'd
  have really left out and it didn't seem worth respinning exynos to
  avoid it, the line change count is quite low.

   radeon: regressions + more dynamic powermanagement fixes, since DPM
     is a new feature, and off by default I'd prefer to keep merging
     fixes since it has a large userbase already and I'd like to keep
     them on mainline

   nouveau: is mostly regression fixes

   i915: is a regression fix since Daniel is on holidays I've merged it.

   mgag200: I've picked a bunch of targetted fixes from a big bunch of
     distro patches,

   exynos: build fixes mostly, one regression fix

  I expect things will slow right down now, I may send on the intel
  early quirk from Jesse separatly, since I think the x86 maintainers
  acked it"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
  drm/i915: fix missed hunk after GT access breakage
  drm/radeon/dpm: re-enable cac control on SI
  drm/radeon/dpm: fix calculations in si_calculate_leakage_for_v_and_t_formula
  drm: fix 64 bit drm fixed point helpers
  drm/radeon/atom: initialize more atom interpretor elements to 0
  drm/nouveau: fix semaphore dmabuf obj
  drm/nouveau/vm: make vm refcount into a kref
  drm/nv31/mpeg: don't recognize nv3x cards as having nv44 graph class
  drm/nv40/mpeg: write magic value to channel object to make it work
  drm/nouveau: fix size check for cards without vm
  drm/nv50-/disp: remove dcb_outp_match call, and related variables
  drm/nva3-/disp: fix hda eld writing, needs to be padded
  drm/nv31/mpeg: fix mpeg engine initialization
  drm/nv50/mc: include vp in the fb error reporting mask
  drm/nouveau: fix null pointer dereference in poll_changed
  drm/nv50/gpio: post-nv92 cards have 32 interrupt lines
  drm/nvc0/fb: take lock in nvc0_ram_put()
  drm/nouveau/core: xtensa firmware size needs to be 0x40000 no matter what
  drm/mgag200: Fix LUT programming for 16bpp
  drm/mgag200: Fix framebuffer pitch calculation
  ...
2013-07-31 17:55:12 -07:00
Linus Torvalds
19788a9008 Merge branch 'akpm' (patches from Andrew Morton)
Merge more patches from Andrew Morton:
 "A bunch of fixes.

  Plus Joe's printk move and rework.  It's not a -rc3 thing but now
  would be a nice time to offload it, while things are quiet.  I've been
  sitting on it all for a couple of weeks, no issues"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  vmpressure: make sure there are no events queued after memcg is offlined
  vmpressure: do not check for pending work to prevent from new work
  vmpressure: change vmpressure::sr_lock to spinlock
  printk: rename struct log to struct printk_log
  printk: use pointer for console_cmdline indexing
  printk: move braille console support into separate braille.[ch] files
  printk: add console_cmdline.h
  printk: move to separate directory for easier modification
  drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created
  mm: zbud: fix condition check on allocation size
  thp, mm: avoid PageUnevictable on active/inactive lru lists
  mm/swap.c: clear PageActive before adding pages onto unevictable list
  arch/x86/platform/ce4100/ce4100.c: include reboot.h
  mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
  rapidio: fix use after free in rio_unregister_scan()
  .gitignore: ignore *.lz4 files
  MAINTAINERS: dynamic debug: Jason's not there...
  dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine()
  ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
  mm: mempolicy: fix mbind_range() && vma_adjust() interaction
2013-07-31 17:52:04 -07:00
Joe Perches
d9d10a3096 ndisc: Add missing inline to ndisc_addr_option_pad
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31 15:18:17 -07:00
Michal Hocko
33cb876e94 vmpressure: make sure there are no events queued after memcg is offlined
vmpressure is called synchronously from reclaim where the target_memcg
is guaranteed to be alive but the eventfd is signaled from the work
queue context.  This means that memcg (along with vmpressure structure
which is embedded into it) might go away while the work item is pending
which would result in use-after-release bug.

We have two possible ways how to fix this.  Either vmpressure pins memcg
before it schedules vmpr->work and unpin it in vmpressure_work_fn or
explicitely flush the work item from the css_offline context (as
suggested by Tejun).

This patch implements the later one and it introduces vmpressure_cleanup
which flushes the vmpressure work queue item item.  It hooks into
mem_cgroup_css_offline after the memcg itself is cleaned up.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Li Zefan <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:04 -07:00
Michal Hocko
22f2020f84 vmpressure: change vmpressure::sr_lock to spinlock
There is nothing that can sleep inside critical sections protected by
this lock and those sections are really small so there doesn't make much
sense to use mutex for them.  Change the log to a spinlock

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Li Zefan <lizefan@huawei.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31 14:41:03 -07:00
Eli Cohen
cd23b14b65 mlx5_core: Implement new initialization sequence
Introduce enbale_hca and disable_hca commands to signify when the
driver starts or ceases to operate on the device.

In addition the driver will use boot and init pages count; boot pages
is required to allow firmware to complete boot commands and the other
to complete init hca.  Command interface revision is bumped to 4 to
enforce using supported firmware.

This patch breaks compatibility with old versions of firmware (< 4);
however, the first GA firmware we will publish will support version 4
so this should not be a problem.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:12:24 -07:00
Linus Torvalds
06693f305e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix association failures not triggering a connect-failure event in
    cfg80211, from Johannes Berg.

 2) Eliminate a potential NULL deref with older iptables tools when
    configuring xt_socket rules, from Eric Dumazet.

 3) Missing RTNL locking in wireless regulatory code, from Johannes
    Berg.

 4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey
    Khoroshilov.

 5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey
    Khoroshilov.

 6) VXLAN namespace teardown fails to unregister devices, from Stephen
    Hemminger.

 7) Fix multicast settings getting dropped by firmware in qlcnic driver,
    from Sucheta Chakraborty.

 8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar.

 9) Fix a nasty bug in bridging where an active timer would get
    reinitialized with a setup_timer() call.  From Eric Dumazet.

10) Fix use after free in new mlx5 driver, from Dan Carpenter.

11) Fix freed pointer reference in ipv6 multicast routing on namespace
    cleanup, from Hannes Frederic Sowa.

12) Some usbnet drivers report TSO and SG in their feature set, but the
    usbnet layer doesn't really support them.  From Eric Dumazet.

13) Fix crash on EEH errors in tg3 driver, from Gavin Shan.

14) Drop cb_lock when requesting modules in genetlink, from Stanislaw
    Gruszka.

15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from
    Dan Carpenter.

16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to
    endless loops, from Uwe Kleine-König.

17) Fix hangs from loading mvneta driver, from Arnaud Patard.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  mlx5: fix error return code in mlx5_alloc_uuars()
  mvneta: Try to fix mvneta when compiled as module
  mvneta: Fix hang when loading the mvneta driver
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
  af_key: more info leaks in pfkey messages
  net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link
  net_sched: Fix stack info leak in cbq_dump_wrr().
  igb: fix vlan filtering in promisc mode when not in VT mode
  ixgbe: Fix Tx Hang issue with lldpad on 82598EB
  genetlink: release cb_lock before requesting additional module
  net: fec: workaround stop tx during errata ERR006358
  qlcnic: Fix diagnostic interrupt test for 83xx adapters.
  qlcnic: Fix setting Guest VLAN
  qlcnic: Fix operation type and command type.
  qlcnic: Fix initialization of work function.
  Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring"
  atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
  net/tg3: Fix warning from pci_disable_device()
  net/tg3: Fix kernel crash
  ...
2013-07-31 12:56:18 -07:00
Oleg Nesterov
2816c551c7 tracing: trace_remove_event_call() should fail if call/file is in use
Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.

The caller is going to free this call/file so we must ensure that
nobody can use them after trace_remove_event_call() succeeds.
debugfs should be fine after the previous changes and event_remove()
does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need
the additional checks:

- There could be a perf_event(s) attached to this tp_event, so the
  patch checks ->perf_refcount.

- TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE,
  so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex.

Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-07-31 13:12:48 -04:00
Samuel Ortiz
9ea7187c53 NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD
Loading a firmware into a target is typically called firmware
download, not firmware upload. So we rename the netlink API to
NFC_CMD_FW_DOWNLOAD in order to avoid any terminology confusion from
userspace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-07-31 01:19:43 +02:00
Dave Airlie
f34f516a8d Merge branch 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux
Alex writes:
- more fixes for SI dpm
- fix DP on some rv6xx boards

* 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: re-enable cac control on SI
  drm/radeon/dpm: fix calculations in si_calculate_leakage_for_v_and_t_formula
  drm: fix 64 bit drm fixed point helpers
  drm/radeon/atom: initialize more atom interpretor elements to 0
2013-07-31 08:46:21 +10:00
Alex Deucher
a838834b2f drm: fix 64 bit drm fixed point helpers
Sign bit wasn't handled properly and a small typo.

Thanks to Christian for helping me sort this out.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-30 17:24:13 -04:00
Colin Cross
2b44c4db2e freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
Calling freeze_processes sets a global flag that will cause any
process that calls try_to_freeze to enter the refrigerator.  It
skips sending a signal to the current task, but if the current
task ever hits try_to_freeze, all threads will be frozen and the
system will deadlock.

Set a new flag, PF_SUSPEND_TASK, on the task that calls
freeze_processes.  The flag notifies the freezer that the thread
is involved in suspend and should not be frozen.  Also add a
WARN_ON in thaw_processes if the caller does not have the
PF_SUSPEND_TASK flag set to catch if a different task calls
thaw_processes than the one that called freeze_processes, leaving
a task with PF_SUSPEND_TASK permanently set on it.

Threads that spawn off a task with PF_SUSPEND_TASK set (which
swsusp does) will also have PF_SUSPEND_TASK set, preventing them
from freezing while they are helping with suspend, but they need
to be dead by the time suspend is triggered, otherwise they may
run when userspace is expected to be frozen.  Add a WARN_ON in
thaw_processes if more than one thread has the PF_SUSPEND_TASK
flag set.

Reported-and-tested-by: Michael Leun <lkml20130126@newton.leun.net>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-30 14:05:06 +02:00
Linus Torvalds
36f571e9ed Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire regression fix from Stefan Richter:
 "This fixes corrupted video capture, seen with IIDC/DCAM video and
  certain buffer settings.  (Regression since v3.4 inclusive.)"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: fix libdc1394/FlyCap2 iso event regression
2013-07-29 17:08:22 -07:00
Rafael J. Wysocki
148519120c Revert "cpuidle: Quickly notice prediction failure for repeat mode"
Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

  We believe we've identified a particular commit to the cpuidle code
  that seems to be impacting performance of variety of workloads.
  The simplest way to reproduce is using netperf TCP_RR test, so
  we're using that, on a pair of Sandy Bridge based servers.  We also
  have data from a large database setup where performance is also
  measurably/positively impacted, though that test data isn't easily
  share-able.

  Included below are test results from 3 test kernels:

  kernel       reverts
  -----------------------------------------------------------
  1) vanilla   upstream (no reverts)

  2) perfteam2 reverts e11538d1f0

  3) test      reverts 69a37beabf
                       e11538d1f0

  In summary, netperf TCP_RR numbers improve by approximately 4%
  after reverting 69a37beabf.  When
  69a37beabf is included, C0 residency
  never seems to get above 40%.  Taking that patch out gets C0 near
  100% quite often, and performance increases.

  The below data are histograms representing the %c0 residency @
  1-second sample rates (using turbostat), while under netperf test.

  - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
  - The last pair, which reverts 69a37beabf,
    shows %c0 in the 80,90,100% bins.

  Below each kernel name are netperf TCP_RR trans/s numbers for the
  particular kernel that can be disclosed publicly, comparing the 3
  test kernels.  We ran a 4th test with the vanilla kernel where
  we've also set /dev/cpu_dma_latency=0 to show overall impact
  boosting single-threaded TCP_RR performance over 11% above
  baseline.

  3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
  TCP_RR trans/s 54323.78

  -----------------------------------------------------------
  3.10-rc2 vanilla RX (no reverts)
  TCP_RR trans/s 48192.47

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     1]: *
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [    49]:
  *************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 perfteam2 RX (reverts commit
  e11538d1f0)
  TCP_RR trans/s 49698.69

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [     2]: **
     40.0000 -    50.0000 [    58]:
  **********************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 test RX (reverts 69a37beabf
  and e11538d1f0)
  TCP_RR trans/s 47766.95

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    27]: ***************************
     40.0000 -    50.0000 [     2]: **
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     2]: **
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [    28]: ****************************

  Sender:
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     1]: *
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     3]: ***
     80.0000 -    90.0000 [     7]: *******
     90.0000 -   100.0000 [    38]: **************************************

  These results demonstrate gaining back the tendency of the CPU to
  stay in more responsive, performant C-states (and thus yield
  measurably better performance), by reverting commit
  69a37beabf.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Cc: 3.8+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-29 13:32:29 +02:00