The reset and poll functionality from (OPAL) firmware supports
PHB and PCI slot at same time. They are identified by ID. This
supports PCI slot ID by:
* Rename the argument name for opal_pci_reset() and opal_pci_poll()
accordingly
* Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument
name.
* One macro is added to produce PCI slot ID.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, the OPAL msglog/console buffer is exposed as a sysfs file, with
the sysfs read handler responsible for retrieving the log from the OPAL
buffer. We'd like to be able to use it in xmon as well.
Refactor the OPAL msglog code to create a new function, opal_msglog_copy(),
that copies to an arbitrary buffer. Separate the initialisation code into
generic memcons init and sysfs file creation.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing. The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.
The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called. When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.
Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines. Before this
patch, however, only partial output would be printed during the timeout wait.
This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost. It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.
The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic(). As such, the "end Kernel panic"
message may still be truncated as follows:
>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not
This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch registers the following two new OPAL interfaces calls
for the platform LED subsystem. With the help of these new OPAL calls,
the kernel will be able to get or set the state of various individual
LEDs on the system at any given location code which is passed through
the LED specific device tree nodes.
(1) OPAL_LEDS_GET_INDICATOR opal_leds_get_ind
(2) OPAL_LEDS_SET_INDICATOR opal_leds_set_ind
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Tested-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.
This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.
This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds support for OPAL EPOW (Environmental and Power Warnings)
and DPO (Delayed Power Off) events for the PowerNV platform. These events
are generated on FSP (Flexible Service Processor) based systems. EPOW
events are generated due to various critical system conditions that
require system shutdown. A few examples of these conditions are high
ambient temperature or system running on UPS power with low UPS battery.
DPO event is generated in response to admin initiated system shutdown
request. Upon receipt of EPOW and DPO events the host kernel invokes
orderly_poweroff() for performing graceful system shutdown.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Acked-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This change adds a char device to access the "PRD" (processor runtime
diagnostics) channel to OPAL firmware.
Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta &
Vishal Kulkarni.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Whenever an interrupt is received for opal the linux kernel gets a
bitfield indicating certain events that have occurred and need handling
by the various device drivers. Currently this is handled using a
notifier interface where we call every device driver that has
registered to receive opal events.
This approach has several drawbacks. For example each driver has to do
its own checking to see if the event is relevant as well as event
masking. There is also no easy method of recording the number of times
we receive particular events.
This patch solves these issues by exposing opal events via the
standard interrupt APIs by adding a new interrupt chip and
domain. Drivers can then register for the appropriate events using
standard kernel calls such as irq_of_parse_and_map().
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Most of the OPAL subsystems are always compiled in for PowerNV and
many of them need to be initialised before or after other OPAL
subsystems. Rather than trying to control this ordering through
machine initcalls it is clearer and easier to control initialisation
order with explicit calls in opal_init.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Cc: Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fastsleep is one of the idle state which cpuidle subsystem currently
uses on power8 machines. In this state L2 cache is brought down to a
threshold voltage. Therefore when the core is in fastsleep, the
communication between L2 and L3 needs to be fenced. But there is a bug
in the current power8 chips surrounding this fencing.
OPAL provides a workaround which precludes the possibility of hitting
this bug. But running with this workaround applied causes checkstop
if any correctable error in L2 cache directory is detected. Hence OPAL
also provides a way to undo the workaround.
In the existing implementation, workaround is applied by the last thread
of the core entering fastsleep and undone by the first thread waking up.
But this has a performance cost. These OPAL calls account for roughly
4000 cycles everytime the core has to enter or wakeup from fastsleep.
This patch introduces a sysfs attribute (fastsleep_workaround_applyonce)
to choose the behavior of this workaround.
By default, fastsleep_workaround_applyonce = 0. In this case, workaround
is applied/undone everytime the core enters/exits fastsleep.
fastsleep_workaround_applyonce = 1. In this case the workaround is
applied once on all the cores and never undone. This can be triggered by
echo 1 > /sys/devices/system/cpu/fastsleep_workaround_applyonce
For simplicity this attribute can be modified only once. Implying, once
fastsleep_workaround_applyonce is changed to 1, it cannot be reverted
to the default state.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This change adds the OPAL interface definitions to allow Linux to read,
write and erase from system flash devices. We register platform devices
for the flash devices exported by firmware.
We clash with the existing opal_flash_init function, which is really for
the FSP flash update functionality, so we rename that initcall to
opal_flash_update_init().
A future change will add an mtd driver that uses this interface.
Changes from Joel Stanley and Jeremy Kerr.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OPAL has its own list of return codes. The patch provides a translation
of such codes in errnos for the opal_sensor_read call, and possibly
others if needed.
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Provide an unregister interface for the opal message notifiers
to be called when not needed like during driver unload/remove.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This commit gets opal-api.h to mostly match the version in Skiboot as of
commit ea7d806ab0ba.
The exceptions are things which are not (currently) used in Linux.
Most of this is just whitespace and a few things moving around. I think
the diff is readable.
Also OpalMessageType became opal_msg_type, requiring a change in the
Linux code.
Finally Skiboot and Linux disagree on CAPI vs CXL, because CAPI means
something else in Linux. To handle that we just point the Linux wrapper,
which is named "cxl" to the OPAL token OPAL_PCI_SET_PHB_CAPI_MODE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
We'd like to get to the stage where the OPAL API is defined in a header
that is identical between Linux and Skiboot.
As step one, split the bits that actually define the API into
opal-api.h. The Linux specific parts stay in opal.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Register a notifier for a OPAL message indicating that the machine
should prepare itself for a graceful power off.
OPAL will tell us if the power off is a reboot or shutdown, but for now
we perform the same orderly_poweroff action.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.
Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Winkle is a deep idle state supported in power8 chips. A core enters
winkle when all the threads of the core enter winkle. In this state
power supply to the entire chiplet i.e core, private L2 and private L3
is turned off. As a result it gives higher powersavings compared to
sleep.
But entering winkle results in a total hypervisor state loss. Hence the
hypervisor context has to be preserved before entering winkle and
restored upon wake up.
Power-on Reset Engine (PORE) is a dedicated engine which is responsible
for powering on the chiplet during wake up. It can be programmed to
restore the register contests of a few specific registers. This patch
uses PORE to restore register state wherever possible and uses stack to
save and restore rest of the necessary registers.
With hypervisor state restore things fall under three categories-
per-core state, per-subcore state and per-thread state. To manage this,
extend the infrastructure introduced for sleep. Mainly we add a paca
variable subcore_sibling_mask. Using this and the core_idle_state we can
distingush first thread in core and subcore.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Deep idle states like sleep and winkle are per core idle states. A core
enters these states only when all the threads enter either the
particular idle state or a deeper one. There are tasks like fastsleep
hardware bug workaround and hypervisor core state save which have to be
done only by the last thread of the core entering deep idle state and
similarly tasks like timebase resync, hypervisor core register restore
that have to be done only by the first thread waking up from these
state.
The current idle state management does not have a way to distinguish the
first/last thread of the core waking/entering idle states. Tasks like
timebase resync are done for all the threads. This is not only is
suboptimal, but can cause functionality issues when subcores and kvm is
involved.
This patch adds the necessary infrastructure to track idle states of
threads in a per-core structure. It uses this info to perform tasks like
fastsleep workaround and timebase resync only once per core.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Originally-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The secondary threads should enter deep idle states so as to gain maximum
powersavings when the entire core is offline. To do so the offline path
must be made aware of the available deepest idle state. Hence probe the
device tree for the possible idle states in powernv core code and
expose the deepest idle state through flags.
Since the device tree is probed by the cpuidle driver as well, move
the parameters required to discover the idle states into an appropriate
common place to both the driver and the powernv core code.
Another point is that fastsleep idle state may require workarounds in
the kernel to function properly. This workaround is introduced in the
subsequent patches. However neither the cpuidle driver or the hotplug
path need be bothered about this workaround.
They will be taken care of by the core powernv code.
Originally-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: linux-pm@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch exposes the available i2c busses on the PowerNV platform
to the kernel and implements the bus driver to support i2c and
smbus commands.
The driver uses the platform device infrastructure to probe the busses
on the platform and registers them with the i2c driver framework.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> (I2C part, excluding the bindings)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cleanup OpalMCE_* definitions/declarations and other related code which
is not used anymore.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Benjamin Herrrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch implements the OPAL rtc driver that binds with the rtc
driver subsystem. The driver uses the platform device infrastructure
to probe the rtc device and register it to rtc class framework. The
'wakeup' is supported depending upon the property 'has-tpo' present
in the OF node. It provides a way to load the generic rtc driver in
in the absence of an OPAL driver.
The patch also moves the existing OPAL rtc get/set time interfaces to the
new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.
Test results:
-------------
Host:
[root@tul169p1 ~]# ls -l /sys/class/rtc/
total 0
lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
[root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
08:10:07
[root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
[root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
1413274345
[root@tul169p1 ~]#
FSP:
$ smgr mfgState
standby
$ rtim timeofday
System time is valid: 2014/10/14 08:12:04.225115
$ smgr mfgState
ipling
$
CC: devicetree@vger.kernel.org
CC: tglx@linutronix.de
CC: rtc-linux@googlegroups.com
CC: a.zummo@towertech.it
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recent OPAL firmare adds a couple of functions to send and receive IPMI
messages:
https://github.com/open-power/skiboot/commit/b2a374da
This change updates the token list and wrappers to suit, and adds the
platform devices for any IPMI interfaces.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds the OPAL call to change a PHB into cxl mode.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The names of PCI reset scopes aren't sychronized with firmware.
The patch fixes it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently there is no way to generically check if an OPAL call exists or not
from the host kernel.
This adds an OPAL call opal_check_token() which tells you if the given token is
present in OPAL or not.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.
This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When we hit the HMI in Linux, invoke opal call to handle/recover from HMI
errors in real mode and then in virtual mode during check_irq_replay()
invoke opal_poll_events()/opal_do_notifier() to retrieve HMI event from
OPAL and act accordingly.
Now that we are ready to handle HMI interrupt directly in linux, remove
the HMI interrupt registration with firmware.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Handle Hypervisor Maintenance Interrupt (HMI) in Linux. This patch implements
basic infrastructure to handle HMI in Linux host. The design is to invoke
opal handle hmi in real mode for recovery and set irq_pending when we hit HMI.
During check_irq_replay pull opal hmi event and print hmi info on console.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch synchronizes header file with firmware to have new OPAL
API opal_pci_eeh_freeze_set(), which is used to freeze the specified
PE in order to support "compound" PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch enables M64 aperatus for PHB3.
We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.
* We're using last M64 BAR to cover M64 aperatus, and it's shared by all
256 PEs.
* We don't support P7IOC yet. However, some function callbacks are added
to (struct pnv_phb) so that we can reuse them on P7IOC in future.
* PE, corresponding to PCI bus with large M64 BAR device attached, might
span multiple M64 segments. We introduce "compound" PE to cover the case.
The compound PE is a list of PEs and the master PE is used as before.
The slave PEs are just for MMIO isolation.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It's followup of commit ddf0322a ("powerpc/powernv: Fix endianness
problems in EEH"). The patch helps to get non-endian-dependent
diag-data.
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In commit 27f4488872 "Add OPAL takeover from PowerVM" we added support
for "takeover" on OPAL v1 machines.
This was a mode of operation where we would boot under pHyp, and query
for the presence of OPAL. If detected we would then do a special
sequence to take over the machine, and the kernel would end up running
in hypervisor mode.
OPAL v1 was never a supported product, and was never shipped outside
IBM. As far as we know no one is still using it.
Newer versions of OPAL do not use the takeover mechanism. Although the
query for OPAL should be harmless on machines with newer OPAL, we have
seen a machine where it causes a crash in Open Firmware.
The code in early_init_devtree() to copy boot_command_line into cmd_line
was added in commit 817c21ad9a "Get kernel command line accross OPAL
takeover", and AFAIK is only used by takeover, so should also be
removed.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
EEH information fetched from OPAL need fix before using in LE environment.
To be included in sparse's endian check, declare them as __beXX and
access them by accessors.
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
struct OpalMemoryErrorData is passed to us from firmware, so we
have to byteswap it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When running as a powernv "host" system on P8, we need to switch
the endianness of interrupt handlers. This does it via the appropriate
call to the OPAL firmware which may result in just switching HID0:HILE
but depending on the processor version might need to do a few more
things. This call must be done early before any other processor has
been brought out of firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Firmware update on PowerNV platform takes several minutes. During
this time one CPU is stuck in FW and the kernel complains about "soft
lockups".
This patch returns all secondary CPUs to firmware before starting
firmware update process.
[ Reworked a bit and cleaned up -- BenH ]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have two copies of code that creates an OPAL sg list. Consolidate
these into a common set of helpers and fix the endian issues.
The flash interface embedded a version number in the num_entries
field, whereas the dump interface did did not. Since versioning
wasn't added to the flash interface and it is impossible to add
this in a backwards compatible way, just remove it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix little endian issues with the OPAL error log code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We had some duplication of the internal OPAL functions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using size_t in our APIs is asking for trouble, especially
when some OPAL calls use size_t pointers.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
next-20140324 currently fails compiling celleb_defconfig with:
arch/powerpc/include/asm/opal.h:894:42: error: 'struct notifier_block' declared inside parameter list [-Werror]
arch/powerpc/include/asm/opal.h:894:42: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
arch/powerpc/include/asm/opal.h:896:14: error: 'struct notifier_block' declared inside parameter list [-Werror]
This is due to a missing include which is added here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This call will not be understood by OPAL, and cause it to add an error
to it's log. Among other things, this is useful for testing the
behaviour of the log as it fills up.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL provides an in-memory circular buffer containing a message log
populated with various runtime messages produced by the firmware.
Provide a sysfs interface /sys/firmware/opal/msglog for userspace to
view the messages.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
One OPAL call and one device tree property needed byte swapping.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL defines opal_msg as a big endian struct so we have to
byte swap it on little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
opal_notifier_register() is missing a pending "unregister" variant
and should be exposed to modules.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc non-virtualized cpuidle from Ben Herrenschmidt:
"This is the branch I mentioned in my other pull request which contains
our improved cpuidle support for the "powernv" platform
(non-virtualized).
It adds support for the "fast sleep" feature of the processor which
provides higher power savings than our usual "nap" mode but at the
cost of losing the timers while asleep, and thus exploits the new
timer broadcast framework to work around that limitation.
It's based on a tip timer tree that you seem to have already merged"
* 'powernv-cpuidle' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
cpuidle/powernv: Parse device tree to setup idle states
cpuidle/powernv: Add "Fast-Sleep" CPU idle state
powerpc/powernv: Add OPAL call to resync timebase on wakeup
powerpc/powernv: Add context management for Fast Sleep
powerpc: Split timer_interrupt() into timer handling and interrupt handling routines
powerpc: Implement tick broadcast IPI as a fixed IPI message
powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
This patch enables fetching of various platform sensor data through
OPAL and expects a sensor handle from the driver to pass to OPAL.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch enables reading and updating of system parameters through
OPAL call.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for notifying the clients of their request
completion. Clients request for the token before making OPAL call
and then wait for the response.
This patch uses messaging infrastructure to pull the data to linux
by registering itself for the message type OPAL_MSG_ASYNC_COMP.
Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This enables support for userspace to fetch and initiate FSP and
Platform dumps from the service processor (via firmware) through sysfs.
Based on original patch from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Flow:
- We register for OPAL notification events.
- OPAL sends new dump available notification.
- We make information on dump available via sysfs
- Userspace requests dump contents
- We retrieve the dump via OPAL interface
- User copies the dump data
- userspace sends ack for dump
- We send ACK to OPAL.
sysfs files:
- We add the /sys/firmware/opal/dump directory
- echoing 1 (well, anything, but in future we may support
different dump types) to /sys/firmware/opal/dump/initiate_dump
will initiate a dump.
- Each dump that we've been notified of gets a directory
in /sys/firmware/opal/dump/ with a name of the dump type and ID (in hex,
as this is what's used elsewhere to identify the dump).
- Each dump has files: id, type, dump and acknowledge
dump is binary and is the dump itself.
echoing 'ack' to acknowledge (currently any string will do) will
acknowledge the dump and it will soon after disappear from sysfs.
OPAL APIs:
- opal_dump_init()
- opal_dump_info()
- opal_dump_read()
- opal_dump_ack()
- opal_dump_resend_notification()
Currently we are only ever notified for one dump at a time (until
the user explicitly acks the current dump, then we get a notification
of the next dump), but this kernel code should "just work" when OPAL
starts notifying us of all the dumps present.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Based on a patch by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
This patch adds support to read error logs from OPAL and export
them to userspace through a sysfs interface.
We export each log entry as a directory in /sys/firmware/opal/elog/
Currently, OPAL will buffer up to 128 error log records, we don't
need to have any knowledge of this limit on the Linux side as that
is actually largely transparent to us.
Each error log entry has the following files: id, type, acknowledge, raw.
Currently we just export the raw binary error log in the 'raw' attribute.
In a future patch, we may parse more of the error log to make it a bit
easier for userspace (e.g. to be able to display a brief summary in
petitboot without having to have a full parser).
If we have >128 logs from OPAL, we'll only be notified of 128 until
userspace starts acknowledging them. This limitation may be lifted in
the future and with this patch, that should "just work" from the linux side.
A userspace daemon should:
- wait for error log entries using normal mechanisms (we announce creation)
- read error log entry
- save error log entry safely to disk
- acknowledge the error log entry
- rinse, repeat.
On the Linux side, we read the error log when we're notified of it. This
possibly isn't ideal as it would be better to only read them on-demand.
However, this doesn't really work with current OPAL interface, so we
read the error log immediately when notified at the moment.
I've tested this pretty extensively and am rather confident that the
linux side of things works rather well. There is currently an issue with
the service processor side of things for >128 error logs though.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Detect and recover from machine check when inside opal on a special
scom load instructions. On specific SCOM read via MMIO we may get a machine
check exception with SRR0 pointing inside opal. To recover from MC
in this scenario, get a recovery instruction address and return to it from
MC.
OPAL will export the machine check recoverable ranges through
device tree node mcheck-recoverable-ranges under ibm,opal:
# hexdump /proc/device-tree/ibm,opal/mcheck-recoverable-ranges
0000000 0000 0000 3000 2804 0000 000c 0000 0000
0000010 3000 2814 0000 0000 3000 27f0 0000 000c
0000020 0000 0000 3000 2814 xxxx xxxx xxxx xxxx
0000030 llll llll yyyy yyyy yyyy yyyy
...
...
#
where:
xxxx xxxx xxxx xxxx = Starting instruction address
llll llll = Length of the address range.
yyyy yyyy yyyy yyyy = recovery address
Each recoverable address range entry is (start address, len,
recovery address), 2 cells each for start and recovery address, 1 cell for
len, totalling 5 cells per entry. During kernel boot time, build up the
recovery table with the list of recovery ranges from device-tree node which
will be used during machine check exception to recover from MMIO SCOM UE.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
During "Fast-sleep" and deeper power savings state, decrementer and
timebase could be stopped making it out of sync with rest
of the cores in the system.
Add a firmware call to request platform to resync timebase
using low level platform methods.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The OPAL firmware functions opal_xscom_read and opal_xscom_write
take a 64-bit argument for the XSCOM (PCB) address in order to
support the indirect mode on P8.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.13]
Its possible that OPAL may be writing to host memory during
kexec (like dump retrieve scenario). In this situation we might
end up corrupting host memory.
This patch makes OPAL sync call to make sure OPAL stops
writing to host memory before kexec'ing.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch implements the EEH operation backend restore_config()
for PowerNV platform. That relies on OPAL API opal_pci_reinit()
where we reinitialize the error reporting properly after PE or
PHB reset.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We are passing pointers to the firmware for reads, we need to properly
convert the result as OPAL is always BE.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
opal_xscom_read uses a pointer to return the data so we need
to byteswap it on LE builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Get the memory errors reported by opal and plumb it into memory poison
infrastructure. This patch uses new messaging channel infrastructure to
pull the fsp memory errors to linux.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move SG list and entry structure to header file so that
it can be used in other places as well.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Opal now has a new messaging infrastructure to push the messages to
linux in a generic format for different type of messages using only one
event bit. The format of the opal message is as below:
struct opal_msg {
uint32_t msg_type;
uint32_t reserved;
uint64_t params[8];
};
This patch allows clients to subscribe for notification for specific
message type. It is upto the subscriber to decipher the messages who showed
interested in receiving specific message type.
The interface to subscribe for notification is:
int opal_message_notifier_register(enum OpalMessageType msg_type,
struct notifier_block *nb)
The notifier will fetch the opal message when available and notify the
subscriber with message type and the opal message. It is subscribers
responsibility to copy the message data before returning from notifier
callback.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Code update interface for powernv platform. This provides
sysfs interface to pass new image, validate, update and
commit images.
This patch includes:
- Below OPAL APIs for code update
- opal_validate_flash()
- opal_manage_flash()
- opal_update_flash()
- Create below sysfs files under /sys/firmware/opal
- image : Interface to pass new FW image
- validate_flash : Validate candidate image
- manage_flash : Commit/Reject operations
- update_flash : Flash new candidate image
Updating Image:
"update_flash" is an interface to indicate flash new FW.
It just passes image SG list to FW. Actual flashing is done
during system reboot time.
Note:
- SG entry format:
I have kept version number to keep this list similar to what
PAPR is defined.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create /sys/firmware/opal directory. We wil use this
interface to fetch opal error logs, firmware update, etc.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch adds function ioda_eeh_phb3_phb_diag() to dump PHB3
PHB diag-data. That's called while detecting informative errors
or frozen PE on the specific PHB.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pnv_pci_setup_bml_iommu was missing a byteswap of a device
tree property.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Sparse caught an issue where opal_set_rtc_time was incorrectly
byteswapping. Also fix a number of sparse warnings.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With OPAL v3 we can return secondary CPUs to firmware on kexec. This
allows firmware to do various cleanups making things generally more
reliable, and will enable the "new" kernel to call OPAL to perform
some reconfiguration tasks early on that can only be done while
all the CPUs are in firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This uses the hooks provided by CONFIG_PPC_INDIRECT_PIO to
implement a set of hooks for IO port access to use the LPC
bus via OPAL calls for the first 64K of IO space
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch implements a notifier to receive a notification on OPAL
event mask changes. The notifier is only called as a result of an OPAL
interrupt, which will happen upon reception of FSP messages or PCI errors.
Any event mask change detected as a result of opal_poll_events() will not
result in a notifier call.
[benh: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch synchronizes OPAL APIs between kernel and firmware. Also,
we starts to replace opal_pci_get_phb_diag_data() with the similar
opal_pci_get_phb_diag_data2() and the former OPAL API would return
OPAL_UNSUPPORTED from now on.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We add a machine_shutdown hook that frees the OPAL interrupts
(so they get masked at the source and don't fire while kexec'ing)
and which triggers an IODA reset on all the PCIe host bridges
which will have the effect of blocking all DMAs and subsequent
PCIs interrupts.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional
steps to handle the P/Q bits in IVE before EOIing the corresponding
interrupt. The patch changes the EOI handler to cover that. we have
individual IRQ chip in each PHB instance. During the MSI IRQ setup
time, the IRQ chip is copied over from the original one for that IRQ,
and the EOI handler is patched with the one that will handle the P/Q
bits (As Ben suggested).
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL can handle various interrupt for us such as Machine Checks (it
performs all sorts of recovery tasks and passes back control to us with
informations about the error), Hardware Management Interrupts and Softpatch
interrupts.
This wires up the mechanisms and prints out specific informations returned
by HAL when a machine check occurs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implements OPAL RTC and NVRAM support and wire all that up to
the powernv platform.
We use RTAS for RTC as a fallback if available. Using RTAS for nvram
is not supported yet, pending some rework/cleanup and generalization
of the pSeries & CHRP code. We also use RTAS fallbacks for power off
and reboot
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a udbg and an hvc console backend for supporting a console
using the OPAL console interfaces.
On OPAL v1 we have hvc0 mapped to whatever console the system was
configured for (network or hvsi serial port) via the service
processor.
On OPAL v2 we have hvcN mapped to the Nth console provided by OPAL
which generally corresponds to:
hvc0 : network console (raw protocol)
hvc1 : serial port S1 (hvsi)
hvc2 : serial port S2 (hvsi)
Note: At this point, early debug console only works with OPAL v1
and shouldn't be enabled in a normal kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add definition of OPAL interfaces along with the wrappers to call
into OPAL runtime and the early device-tree parsing hook to locate
the OPAL runtime firmware.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On machines supporting the OPAL firmware version 1, the system
is initially booted under pHyp. We then use a special hypercall
to verify if OPAL is available and if it is, we then trigger
a "takeover" which disables pHyp and loads the OPAL runtime
firmware, giving control to the kernel in hypervisor mode.
This patch add the necessary code to detect that the OPAL takeover
capability is present when running under PowerVM (aka pHyp) and
perform said takeover to get hypervisor control of the processor.
To perform the takeover, we must first use RTAS (within Open
Firmware runtime environment) to start all processors & threads,
in order to give control to OPAL on all of them. We then call
the takeover hypercall on everybody, OPAL will re-enter the kernel
main entry point passing it a flat device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>