Bug 8650
Summary: | disk errors due SATA timeout | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Vasily Averin (vasily.averin) |
Component: | Serial ATA | Assignee: | Tejun Heo (htejun) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, jack, miguelmarte, protasnb, tglx |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22-rc4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
dmesg output
/proc/interrupts output lspci -vvvxxx output dont-enable-pci-irq-for-legacy-host.patch 2.6.22-rc4+tejun.patch: serial console logs /proc/interrupts output dmesg interrupts output of lspci 2.6.22-rc7: serial console log with sysrq 2.6.22-rc7: sata timeouts issue again Debugging patch for JBD lockup 2.6.22: JBD lockup, serial console logs serial console log (2.6.23rc7 + thgl patch+ "acpi=noirq") |
Description
Vasily Averin
2007-06-19 04:08:50 UTC
Created attachment 11793 [details]
dmesg output
Created attachment 11794 [details]
/proc/interrupts output
Created attachment 11795 [details]
lspci -vvvxxx output
Are you able to identify an earlier kernel version whcih did not have this problem? Created attachment 11808 [details]
dont-enable-pci-irq-for-legacy-host.patch
Can you please try the attached patch and report the result?
(In reply to comment #4) > Are you able to identify an earlier kernel version whcih did not have this > problem? I've reproduced this issue on RHEL5 and RHEL5 based kernels. Jan Evert van Grootheest has reproted about very similar issue on 2.6.18-4-xen-amd64 kernel http://www.mail-archive.com/linux-ide@vger.kernel.org/msg06011.html However I can say nothing about older kernels. (In reply to comment #5) > Created an attachment (id=11808) [details] > dont-enable-pci-irq-for-legacy-host.patch > > Can you please try the attached patch and report the result? Thank you, Tejun However I would note that I have no a simple reproducer for this issue and my experiments may take a lot of time. Okay, setting status to NEEDINFO. Please report back when you can confirm the result either way. Thanks. (In reply to comment #0) > Reproduced on 2.6.22-rc4 kernel, in this case EH works well and disk become > alive btw, I was wrong and disk access return IO errors :( Yeah, the disk was gone in the log. EH gave up after several tries. Created attachment 11838 [details]
2.6.22-rc4+tejun.patch: serial console logs
Tejun,
I've reproduced this issue again on the patched kernel (i.e. 2.6.22-rc4 + your patch). Serial console logs are in attachments.
Yesterday I've started the following tests:
ltp
volano
jbb2000
unixbench
aim7
smtp
I would note that during aim7 test I've got several "hda: lost interrupt" messages but node still worked well.
Today I've restarted the tests again -- and it leads to SATA timeout.
Created attachment 11839 [details]
/proc/interrupts output
Sorry, cannot get lspci output -- this binary is not in disk cashe :( Created attachment 11868 [details]
dmesg
Same problem on the sata_via, it didn't happen until i added in a 500gig into my system which made me reorder most of my hds. Originally i didn't have this problem, and i didn't move the one on ata4, which is the one with the problem. I had 4 hds, 3 WD on the sata_promise, and 1 Seagate 320 gig 7200.10 on port1 on the sata_via. I removed the one on the ide on the promise controller and moved 1 of the 250 WD which was on the promise controller to the sata_via on the 2nd port. I added the 500G WD to the promise. I never moved the Seagate which is the one with the problem. I'm attaching the lspci and interrupts.
Created attachment 11869 [details]
interrupts
Here are the interrupts.
Created attachment 11870 [details]
output of lspci
here is the lspci. If you want the boot log, just for it.
Vasily, please try to use pata_via instead of via82cxxx. Miguel, please post full dmesg including boot messages. Also, please try pata_via with the posted patch applied on top of 2.6.22-rc7. Also, you can use LABEL's to mount filesystems to avoid chaos when drives get moved around. If you use LABEL aware initrd, you don't even have to change the root device (grub might have some problem tho). Good news, it seems that the problem is caused by interference and not the driver. I first tried rc7 but that didn't solve anything, as well as the patch. I thought that if this was a huge problem, more people would have noticed. I removed all the cables and reorganized them because i noticed that the power lines were too close the the data ones. After i did that, the problem hasn't appeared again. I have been running 3 days straight with out any errors. Vasily should recheck his cables or possibly replace it if its bad. Thanks. Actually, it seems more than half of this kind of reports are from hardware issues. Power related problems are the most common. Cabling is pretty common too. Also, some controllers and drives have problems talking to each other in 3Gbps. If you have *any* electrical issues, the SATA link will be the first one to suffer because it runs really fast (which also implies narrower voltage swing) and pretty long. (In reply to comment #18) > Vasily should recheck his cables or possibly replace it if its bad. We have replaced cables already, actually it was our first action. Unfortunately it did not help us. I didn't replace the cable also, just moved them away from from the power cables which were crossing them. Look for something that would interfere with the connection. And as Tejun said, the sata links are sensitive to power fluctuations, so a bad psu can cause this too. (In reply to comment #21) > I didn't replace the cable also, just moved them away from from the power > cables which were crossing them. Look for something that would interfere > with > the connection. And as Tejun said, the sata links are sensitive to power > fluctuations, so a bad psu can cause this too. Ok, I'm understand you now, thank you for explanation. I personally will try to do it on my testnode, but it is not an option in general case, especially for people that have server in datacenter. :( Tejun, is it probably some way to limits speed of SATA link? I've switched to pata_via instead of via82cxxx: Good news: - SATA disconnets are go away, sata controller do not reports about any problems now. Bad news: - I see timeouts on DVD attaced to PATA controller (please see console logs in my next attachment) - Node works well some time but then IO is stalled and node hangs I've reproduced it: - on 2.6.22-rc4 kernel + Tejun patch (changes: ide was disabled and pata_via enabled). I have serial console logs and can attach it if it's required - on original 2.6.22-rc7 kernel Created attachment 11972 [details]
2.6.22-rc7: serial console log with sysrq
As far as I understand node hangs is not related to libata, IMHO it is ext3/jbd related issue. I believe it makes sense to file new bug for this issue. Created attachment 11973 [details]
2.6.22-rc7: sata timeouts issue again
I've reproduced SATA timeouts issue again on plain 2.6.22-rc7 kernel (with pata_via).
I've created crashdump and hope it can help us.
Thanks for testing. 1. In the lockup in comment 24, everything seems to be stuck in ext3. It's probable that SCSI and libata are to blame. I need some help to decode what's going on inside ext3/jbd tho. cc'ing Jan Kara for help. 2. The PATA and SATA controllers share a PCI IRQ line. The PATA controller also seem to be hardwired to 14 or 14/15 depending on controller mode. The driver/ide drivers use IRQ auto-detection and detect 14 for the PATA part while libata honors pdev->irq and use 20. The end result is the same tho. One of the two hosts lose ability to assert IRQ and everything falls down. It's definitely related to IRQ routing and is really peculiar. Well, I wouldn't expect anything less from the vendor. :-) Does "acpi=noirq" make any difference? It seems kjournald is stuck inside journal_commit_transaction() and everybody else waits for a transaction to commit or waits in journal_force_transaction() (which is suspitious). It could be that we missed some IO completion and we are infinitely waiting for it but it could also be something different... Questions to Vasily: Is the number of threads normal or you spawn new threads when current threads become unavailable? Also could you please run gdb on vmlinux of the kernel which produced messages in comment 24 and send me output of disass journal_commit_transaction (I need to find out which part of the function corresponds to the offset 0x181). Thanks. Hmm, looking more into the traces I'd guess that commit is waiting for all updates of the running transaction to be finished. But there are 7 processes (2322, 2345, 2362, 2379, 2385, 2410, 2437) which are waiting in journal_stop (called from journal_force_commit). This is actually strange as it seems these processes are never woken up from schedule_timeout_uninterruptible(1) - they are in the same state in backtraces several minutes away... (In reply to comment #28) > Questions > to Vasily: Is the number of threads normal or you spawn new threads when > current threads become unavailable? Also could you please run gdb on vmlinux > of > the kernel which produced messages in comment 24 and send me output of disass > journal_commit_transaction (I need to find out which part of the function > corresponds to the offset 0x181). Thanks. Jan, I'm not sure that I'm understand your first question correctly. I've just executed AIM7 test on the node and wait until it hangs. Then I've tried to execute something in old ssh session (without success) and then I've pressed SysRq keys. journal_commit_transaction+0x181 points to next instruction after "call schedule" 0000000000002296 <journal_commit_transaction>: 2296: 41 57 push %r15 2298: 41 56 push %r14 229a: 49 89 fe mov %rdi,%r14 229d: 41 55 push %r13 ... 23ee: f3 a5 rep movsl %ds:(%rsi),%es:(%rdi) 23f0: 48 89 de mov %rbx,%rsi 23f3: 48 89 ef mov %rbp,%rdi 23f6: e8 00 00 00 00 callq 23fb <journal_commit_transaction+0x165> 23f7: R_X86_64_PC32 prepare_to_wait+0xfffffffffffffffc 23fb: 41 83 7d 74 00 cmpl $0x0,0x74(%r13) 2400: 74 27 je 2429 <journal_commit_transaction+0x193> 2402: 41 c7 45 70 01 00 00 movl $0x1,0x70(%r13) 2409: 00 240a: 41 c7 46 24 01 00 00 movl $0x1,0x24(%r14) 2411: 00 2412: e8 00 00 00 00 callq 2417 <journal_commit_transaction+0x181> 2413: R_X86_64_PC32 schedule+0xfffffffffffffffc 2417: 49 8d 7e 24 lea 0x24(%r14),%rdi <<<<<<<< HERE 241b: e8 00 00 00 00 callq 2420 <journal_commit_transaction+0x18a> 241c: R_X86_64_PC32 _spin_lock+0xfffffffffffffffc 2420: 49 8d 7d 70 lea 0x70(%r13),%rdi 2424: e8 00 00 00 00 callq 2429 <journal_commit_transaction+0x193> 2425: R_X86_64_PC32 _spin_lock+0xfffffffffffffffc 2429: 48 89 de mov %rbx,%rsi 242c: 48 89 ef mov %rbp,%rdi 242f: e8 00 00 00 00 callq 2434 <journal_commit_transaction+0x19e> 2430: R_X86_64_PC32 finish_wait+0xfffffffffffffffc 2434: e9 5a ff ff ff jmpq 2393 <journal_commit_transaction+0xfd> 2439: 41 c7 45 70 01 00 00 movl $0x1,0x70(%r13) (In reply to comment #29) > Hmm, looking more into the traces I'd guess that commit is waiting for all > updates of the running transaction to be finished. But there are 7 processes > (2322, 2345, 2362, 2379, 2385, 2410, 2437) which are waiting in journal_stop > (called from journal_force_commit). This is actually strange as it seems > these > processes are never woken up from schedule_timeout_uninterruptible(1) - they > are in the same state in backtraces several minutes away... What do you think about process 2411? it have unusual calltrace: ext3_file_write generic_file_buffered_write ext3_prepare_write (ret = -ENOSPACE, retry) ext3_should_retry_alloc journal_force_commit_nested log_wait_commit wait_event(journal->j_wait_done_commit, !tid_gt(tid, journal->j_commit_sequence)); I would note that after node reboot I've found that root partition was full. Thanks for that disassembly. It confirmes that we are waiting on t_updates (i.e. waiting until all processes writing to the current transaction finish, which never happens due to some processes waiting in schedule_timeout_uninterruptible(1)). I've also noticed the process 2411 but that seems to be harmless. It just correctly waits for a transaction to be committed without any handle held. Those 7 processes above are the only ones which seem to hold the handle. Could you try reproducing the problem with the attached patch? That would confirm my suspicion that there's some strange bug in kernel timers... Created attachment 12053 [details]
Debugging patch for JBD lockup
BTW could somebody please comment endless resets/timeouts on PATA-attached DVDrom drive? IMHO it is abnormal. Jul 16 17:45:13 ts28 sr 2:0:0:0: ioctl_internal_command return code = 8000002 : Sense Key : Aborted Command [current] [descriptor] : Add. Sense: No additional sense information Jul 16 17:45:45 ts28 ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata3.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:03:00:00:20/00:00:00:00:00/a0 Emask 0x4 (timeout) ata3: soft resetting port ata3.00: configured for PIO0 ata3: EH complete Jul 16 17:46:16 ts28 ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata3.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 res 40/00:03:00:00:20/00:00:00:00:00/a0 Emask 0x4 (timeout) ata3: soft resetting port ata3.00: configured for PIO0 This is an automated reply. I am away on vacation until August 13 and thus will not be able to reply to your email. I've reproduced ext3/jbd lockup again, now on 2.6.22 kernel. As before, "multitask" processes are waiting in schedule_timeout_uninterruptible() I've prepared crash-dump, and analyzed it by using crash utility. "ps -l" command (-l is used to display the tasks last_run or timestamp values) shows that all "multitask" processes has not been scheduled at least several hours, since last EH message: Jul 12 09:00:16 ts28 Linux version 2.6.22 (vvs@vvs.work.ve) <<< node boot Jul 17 09:01:03 ts28 ata3: EH complete <<< last EH message Jul 17 12:49:38 ts28 SysRq : HELP : loglevel0-8 reBoot <<< SysRq keypresses Jul 17 13:28:01 ts28 SysRq : Trigger a crashdump <<<< Crashdump creation crash> ps -l [447400288986915] PID: 1391 TASK: ffff810037db9100 CPU: 1 COMMAND: "klogd" <<<< Crashdump timestump [447400288237424] PID: 0 TASK: ffffffff80543a80 CPU: 0 COMMAND: "swapper" [447400288237424] PID: 2680 TASK: ffff8100df7cf040 CPU: 0 COMMAND: "hald-addon-keyb" [447400288232031] PID: 10 TASK: ffff8100df7c3890 CPU: 1 COMMAND: "events/1" [447400285705077] PID: 0 TASK: ffff810037d76850 CPU: 1 COMMAND: "swapper" [445298275932286] PID: 1388 TASK: ffff81008a305040 CPU: 1 COMMAND: "syslogd" <<<< SysRq keys timestamp [445020120551131] PID: 82 TASK: ffff8100df7c8040 CPU: 0 COMMAND: "kblockd/0" [445020104277966] PID: 2735 TASK: ffff8100df719080 CPU: 1 COMMAND: "mingetty" <<<< last EH message timestamp [431415855283835] PID: 13426 TASK: ffff8100649970c0 CPU: 0 COMMAND: "multitask" [431415855268119] PID: 13449 TASK: ffff810014639790 CPU: 0 COMMAND: "multitask" [431415855258044] PID: 13466 TASK: ffff81007dfb4100 CPU: 1 COMMAND: "multitask" [431415855250139] PID: 13411 TASK: ffff810034397100 CPU: 0 COMMAND: "multitask" [431415855232188] PID: 13422 TASK: ffff810090fac040 CPU: 0 COMMAND: "multitask" [431415855226922] PID: 13459 TASK: ffff810014b7a750 CPU: 1 COMMAND: "multitask" [431415855213345] PID: 13434 TASK: ffff8100dce76040 CPU: 0 COMMAND: "multitask" [431415855200411] PID: 13444 TASK: ffff8100d004f140 CPU: 1 COMMAND: "multitask" Created attachment 12062 [details]
2.6.22: JBD lockup, serial console logs
I have crash-dump and is able to check any other ideas.
(In reply to comment #33) > BTW could somebody please comment endless resets/timeouts on PATA-attached > DVDrom drive? IMHO it is abnormal. Have you tried "acpi=noirq" yet? (In reply to comment #37) > Have you tried "acpi=noirq" yet? Yes, last week I've tested my kernels with "acpi=noirq" and SATA timeout issue is still not reproduced. I've got only JBD lockups. (In reply to comment #29) > Hmm, looking more into the traces I'd guess that commit is waiting for all > updates of the running transaction to be finished. But there are 7 processes > (2322, 2345, 2362, 2379, 2385, 2410, 2437) which are waiting in journal_stop > (called from journal_force_commit). This is actually strange as it seems > these > processes are never woken up from schedule_timeout_uninterruptible(1) - they > are in the same state in backtraces several minutes away... It seems I've found the confirmation that it was timer-related issue: Jul 7 09:51:28 ts28 SysRq : Show Pending Timers Jul 7 09:51:28 ts28 now at 122082292361305 nsecs Jul 7 09:54:49 ts28 SysRq : Show Pending Timers Jul 7 09:54:49 ts28 now at 122081350371493 nsecs Timer on the node is not works, moreover second timestamp is older than first. BTW, I've reproduced this issue, acpi_pm clocksource was used. This is an automated reply. I am away on vacation until August 13 and thus will not be able to reply to your email. Vasily, do you still have the problem? If yes, I could find somebody more knowledgeable with timers to have a look... I've disabled irqbalance service on the node and after this change node works without any troubles. I believe it is not fix but work around only. Hmm, I've just seen a thread on LKML, speaking about problems with dying network card caused by some IRQ problems. That might be related. The thread is: Re: 2.6.20->2.6.21 - networking dies after random time Could you maybe try a patch from http://lkml.org/lkml/2007/8/6/11? Or it's probably going to be fixed in 2.6.23... Any news here? How about latest 2.6.23-rc? I've reproduced issue on stock 2.6.23-rc7: timer interrupts (i.e. irq0) were stalled, but all other interrupts worked well. Then I've applied patch from http://lkml.org/lkml/2007/8/6/11 -- and it looks like it helps, now I cannot reproduce this issue for several days already. Patch is not helps, testnode hangs again. :( Hmm, OK. I'll try to bring attention of someone who understands timers and hardware better :). Thomas, do you have an idea what can cause these hangs or how to debug them? It seems timers suddently stop working. According to Thomas Gleixner recommendation I've applied path that prevent irq balancing for irq0 http://lkml.org/lkml/2007/10/9/25 While testing I've got the following results: 1) irq0 issue has gone away, timer interrupt works well now 2) unfortunately SATA timeout issue has return back, I've reproduced it both without and with "acpi=noirq" option on 2.6.23-rc7 kernel Created attachment 13410 [details]
serial console log (2.6.23rc7 + thgl patch+ "acpi=noirq")
Vasily, can you please try 2.6.24-rc2? 2.6.24-rc2 kernel with both ide and sata enabled, without "acpi=noirq", and with irqbalance service enabled After 10-days-long testing node still works without any troubles. I do not understand how it is, but it looks like issue is resolved now. One strange is that /proc/interrupts shows too few timer interrupts now CPU0 CPU1 0: 288 1 IO-APIC-edge timer 1: 20 9 IO-APIC-edge i8042 3: 1 1 IO-APIC-edge 4: 136 636 IO-APIC-edge 6: 4 1 IO-APIC-edge floppy 7: 0 0 IO-APIC-edge parport0 8: 0 1 IO-APIC-edge rtc 9: 0 1 IO-APIC-fasteoi acpi 12: 126 9 IO-APIC-edge i8042 14: 6211791 1275547 IO-APIC-edge ide0 16: 384157 270595968 IO-APIC-fasteoi sata_via 17: 0 0 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, ehci_hcd:usb4 18: 9663580 1 IO-APIC-fasteoi eth0 NMI: 0 0 Non-maskable interrupts LOC: 833318712 833181920 Local timer interrupts RES: 47631156 35226096 Rescheduling interrupts CAL: 391 799 function call interrupts TLB: 1982793 1829733 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 |