Commit Graph

6 Commits

Author SHA1 Message Date
Paul Mackerras f18fc729cd Merge ../linux-2.6 2006-05-05 15:45:48 +10:00
Linas Vepstas 054d8ff377 [PATCH] powerpc/pseries: avoid crash in PCI code if mem system not up
The powerpc code is currently performing PCI setup before memory
initialization.  PCI setup touches PCI config space registers.  If the PCI
card is bad, this will evoke an error, which currrently can't be handled,
as the PCI error recovery code expects kmalloc() to be functional.  This
patch will cause the system to punt instead of crashing with

cpu 0x0: Vector: 300 (Data Access) at [c0000000004434d0]
    pc: c0000000000c06b4: .kmem_cache_alloc+0x8c/0xf4
    lr: c00000000004ad6c: .eeh_send_failure_event+0x48/0xfc

This patch will also print name of the offending pci device.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-05-03 23:06:40 +10:00
Linas Vepstas ac325acd50 [PATCH] powerpc/pseries: clear PCI failure counter if no new failures
The current PCI error recovery system keeps track of the number of PCI card
resets, and refuses to bring a card back up if this number is too large.
The goal of doing this was to avoid an infinite loop of resets if a card is
obviously dead.  However, if the failures are rare, but the machine has a
high uptime, this mechanism might still be triggered; this is too harsh.

This patch will avoids this problem by decrementing the fail count after an
hour.  Thus, as long as a pci card BSOD's less than 6 times an hour, it
will continue to be reset indefinitely.  If it's failure rate is greater
than that, it will be taken off-line permanently.

This patch is larger than it might otherwise be because it changes
indentation by removing a pointless while-loop.  The while loop is not
needed, as the handler is invoked once fo each event (by schedule_work());
the loop is leftover cruft from an earlier implementation.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-22 18:46:13 +10:00
Linas Vepstas 8c33fd11e3 [PATCH] powerpc/pseries: mutex lock to serialize EEH event processing
This forces the processing of EEH PCI events to be serialized,
using a very simple mutex lock. This serialization is required to
avoid races involving additional PCI device failures that may occur
during the recovery phase of a previous failure.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-04-01 22:35:01 +11:00
Linas Vepstas 77bd741561 [PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines
Various PCI bus errors can be signaled by newer PCI controllers.  The
core error recovery routines are architecture dependent.  This patch adds
a recovery infrastructure for the  PPC64 pSeries systems.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)
2006-01-10 15:28:32 +11:00
Linas Vepstas 172ca92618 [PATCH] ppc64: PCI error event dispatcher
12-eeh-event-dispatcher.patch

ppc64: EEH Recovery dispatcher thread

This patch adds a mechanism to create recovery threads when an
EEH event is received.  Since an EEH freeze state may be detected
within an interrupt context, we need to get out of the interrupt
context before starting recovery. This dispatcher does this in
two steps: first, it uses a workqueue to get out, and then
lanuches a kernel thread, so that the recovery routine can
sleep for exteded periods without upseting the keventd.

A kernel thread is created with each EEH event, rather than
having one long-running daemon started at boot time.  This is
because it is anticipated that EEH events will be very rare
(very very rare, ideally) and so its pointless to cluter the
process tables with a daemon that will almost never run.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2005-11-10 11:38:05 +11:00