Bug 6344

Summary: ESP DMA and sbus (mainly in sparc32 but in sparc64 too)
Product: Platform Specific/Hardware Reporter: JKB (mt1)
Component: SPARC32Assignee: Martin Habets (errandir_news)
Status: CLOSED CODE_FIX    
Severity: blocking CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16.1 Subsystem:
Regression: --- Bisected commit-id:

Description JKB 2006-04-07 02:19:32 UTC
Most recent kernel where this bug did not occur: 2.4.32.
Distribution: debian.
Hardware Environment: SparcSTATION 20 (sun4m), 448 MBytes of RAM, esp (FAS100A),
Mbus SuperSPARC SM71.
Software Environment: very basic kernel with built-in raid1, UP.
Problem Description:

When traffic on SCSI bus is high, esp hangs and freezes the system with :

Apr  6 20:36:01 lebegue kernel: esp0: DMA error a440030e
Apr  6 20:36:01 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 3 not responding to
selection.
Apr  6 20:36:01 lebegue kernel: esp0: Warning, live target 1 not responding to
selection.
Apr  6 20:36:02 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 3 not responding to
selection.
Apr  6 20:36:02 lebegue kernel: esp0: Warning, live target 1 not responding to
selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: Warning, live target 3 not responding to
selection.
Apr  6 20:36:03 lebegue kernel: esp0: Resetting scsi bus
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:03 lebegue kernel: esp0: SCSI bus reset interrupt
Apr  6 20:36:13 lebegue kernel: sd 0:0:3:0: scsi: Device offlined - not ready
after error recovery

Steps to reproduce:
1/ use a SparcSTATION (sun4c, sun4m, maybe sun4d but I don't have any sun4d...),
or UltraSPARC with Sbus and ESP FAS100A (for example U1, the U1E works fine with
HME ESP adapter) ;
2/ connect on the internal SCSI bus one or two disks
3/ stress the SCSI bus (for example with raid1)
4/ wait for the crash... Several hours with SuperSPARC-II on a SS20, a few
minutes with HyperSPARC/200 ;-)

Observations: the frequency of this bug is greater with HyperSPARC cpu than
SuperSPARC. I think that the bug comes from DMA/VDMA/Sbus support, but I don't
find any mistake in the sources... I have seen this bug on several stations (SS5
with MicroSPARC-II, SS20 with SuperSPARC and HyperSPARC UP, U1 with UltraSPARC-I).

Regards,

JKB
Comment 1 JKB 2006-04-09 07:14:34 UTC
I have tested some different configurations and this bug seems to come from
highmem support in sparc32 tree. But I don't know why my U1 has the same.

JKB
Comment 2 Jurij Smakov 2006-04-10 21:15:31 UTC
I can reproduce this bug on SparcStation 20, with Debian's kernel 2.6.16-5 from
unstable (essentially 2.6.16.2). I'm getting a slightly different error code:

 esp0: DMA error a440030f
Comment 3 Martin Habets 2006-06-23 03:34:38 UTC
Bob Breuer submitted a patch fir this, which has been pushed upstream
by David Miller.

http://marc.theaimsgroup.com/?l=linux-sparc&m=115077649707675&w=2