Bug 5589

Summary: libata - Oops when sata drive under load
Product: IO/Storage Reporter: Larkin Lowrey (llowrey)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: akpm, bunk, diegocg
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.14 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Kernel debug output

Description Larkin Lowrey 2005-11-10 17:14:00 UTC
Distribution: Fedora Core 4
Hardware Environment: Dual AthlonMP 2000+, 760MPX, 2GB RAM, Sil3114
Problem Description: Kernel Oops when sata drive is under load

Steps to reproduce: dd if=/dev/sda of=/dev/null bs=1M

I can run the above dd command on hda and hdc all day long but sda will oops
every time. Started when I added the Sil3114 and tried to create a raid5 array.
The system would oops when initializing the array. Finally found that the sata
drive would oops with the above dd read op. The drive is a WD2000JD-00H. I have
tried with a PATA drive hooked up via an Sil3611 bridge but ran into the same
problem. 

Most Oops messages left on the console show "BUG: spinlock lockup on CPU#0" ...

Nov  9 17:11:28 mcp kernel: Unable to handle kernel paging request at virtual
address 041242c7
Nov  9 17:11:28 mcp kernel:  printing eip:
Nov  9 17:11:28 mcp kernel: c014f2fa
Nov  9 17:11:28 mcp kernel: *pde = 00000000
Nov  9 17:11:28 mcp kernel: Oops: 0002 [#1]
Nov  9 17:11:28 mcp kernel: SMP
Nov  9 17:11:28 mcp kernel: Modules linked in: ipv6 parport_pc lp parport
autofs4 w83627hf hwmon_vid eeprom i2c_isa i2c_matroxfb i2c_algo_bit
matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450
g450_pll matroxfb_misc rfcomm l2cap bluetooth sunrpc dm_mod video button battery
ac ohci_hcd i2c_amd756 i2c_core shpchp snd_intel8x0 snd_ac97_codec snd_ac97_bus
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc e1000 floppy sd_mod
Nov  9 17:11:28 mcp kernel: CPU:    0
Nov  9 17:11:28 mcp kernel: EIP:    0060:[<c014f2fa>]    Not tainted VLI
Nov  9 17:11:28 mcp kernel: EFLAGS: 00010083   (2.6.14)
Nov  9 17:11:28 mcp kernel: EIP is at cache_alloc_refill+0xaa/0x280
Nov  9 17:11:28 mcp kernel: eax: 61c584c6   ebx: 0000003c   ecx: f616c000   edx:
041242c3
Nov  9 17:11:28 mcp kernel: esi: f7ffe520   edi: 00000000   ebp: f7ff1a00   esp:
f5e3de8c
Nov  9 17:11:28 mcp kernel: ds: 007b   es: 007b   ss: 0068
Nov  9 17:11:28 mcp kernel: Process automount (pid: 2100, threadinfo=f5e3d000
task=c2266030)
Nov  9 17:11:28 mcp kernel: Stack: 000000d0 f7ff0dc0 f7ffe548 f7ef8000 f7ffe520
80009000 8000a000 00000202
Nov  9 17:11:28 mcp kernel:        000000d0 f7ff0dc0 c22e9570 c014f6fa f7df2680
00000023 f7cef38c c011fa78
Nov  9 17:11:28 mcp kernel:        f7cef34c f5e3d000 f7df26b0 f7818960 f78189ac
f7cef398 f7cef3ac f7cef3a4
Nov  9 17:11:28 mcp kernel: Call Trace:
Nov  9 17:11:28 mcp kernel:  [<c014f6fa>] kmem_cache_alloc+0x6a/0x70
Nov  9 17:11:28 mcp kernel:  [<c011fa78>] copy_mm+0x1e8/0x3d0
Nov  9 17:11:28 mcp kernel:  [<c0120719>] copy_process+0x569/0xec0
Nov  9 17:11:28 mcp kernel:  [<c012116e>] do_fork+0x6e/0x206
Nov  9 17:11:28 mcp kernel:  [<c010753f>] do_syscall_trace+0x20f/0x225
Nov  9 17:11:28 mcp kernel:  [<c0101a92>] sys_clone+0x32/0x40
Nov  9 17:11:28 mcp kernel:  [<c0102f55>] syscall_call+0x7/0xb
Nov  9 17:11:28 mcp kernel: Code: 85 d2 0f 85 59 01 00 00 85 db 7e 4f 8b 74 24
10 8b 0e 39 f1 0f 84 25 01 00 00 8b 54 24 04 8b 41 10 39 42 38 77 6f 8b 11 8b 41
04 <89> 42 04 89 10 83 79 14 ff c7 01 00 01 10 00 c7 41 04 00 02 20
Comment 1 Andrew Morton 2005-11-11 03:15:45 UTC
That's a funny-looking backtrace - it has nothing to do with the IO system.

Does the oops trace always look like that?  If not, please send more
instances.

Please enable CONFIG_DEBUG_KERNEL, CONFIG_DEBUG_SLAB and
CONFIG_DEBUG_PAGEALLOC before running more tests, thanks.

Comment 2 Larkin Lowrey 2005-11-11 10:39:23 UTC
Created attachment 6544 [details]
Kernel debug output
Comment 3 Diego Calleja 2006-07-30 11:27:59 UTC
Does it still reproduces in recent kernels? (libata is moving target)

If so, can you try 2.6.18-rc3? There's a better error handlind there and may
oputput more useful info.
Comment 4 Adrian Bunk 2006-10-05 14:40:24 UTC
Please reopen this bug if it's still present in kernel 2.6.18.