Bug 8889
Summary: | Raid Level 1 causes "soft resetting port" on ata devices | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Bjoern Olausson (lkmlist) |
Component: | MD | Assignee: | Alan (alan) |
Status: | VERIFIED CODE_FIX | ||
Severity: | normal | CC: | akpm, alan, greg, htejun, jgarzik |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Clock HPT374 with 50 MHz DPLL (for real :-) |
Description
Bjoern Olausson
2007-08-15 05:26:43 UTC
This looks much more like a sata-shat-itself bug than an MD bug. Just to confirm: did you really mean that 2.6.22.1 is OK, but 2.6.22.2 failed? I do! 2.6.22.1 <--- WORKS 2.6.22.2 <--- DOES NOT WORK regards Bjoern It would be very useful to know which changeset of 2.6.22.* broke it - probably one touching the pata driver ? Anything you want me to do? All I could do is to check the changelog, but I guess you didt that already ;-) Thanks and regards Bjoern Can you use git to do a 'git bisect' to see which exact patch broke your machine? It shouldn't take that long to do, as you have a simple way to test the result :) Okay, please give me some advice how to handle this "git bisect". What sohould I "bisct" cmpiled sources or a bisect of /usr/src/linux-2.6.22.1 and /usr/src/linux-2.6.22.2 ? Thanks for your advice regards Bjoern Okay, maybe this is related: I can't burn DVDs on my Desktop using 2.6.22.1/2 after some MB or GB or during finalisation (it divers very much) I get this error: Aug 16 01:27:03 freax ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Aug 16 01:27:03 freax ata5.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0xad data 4 in Aug 16 01:27:03 freax res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Aug 16 01:27:04 freax ata5: soft resetting port Aug 16 01:27:05 freax ata5.00: configured for UDMA/66 Aug 16 01:27:05 freax ata5: EH complete Aug 16 01:27:05 freax ata5.00: 16 bytes trailing data here some Infos about the desktop 00:00.0 Host bridge: Intel Corporation 975X Express Memory Controller Hub (rev c0) 00:01.0 PCI bridge: Intel Corporation 975X Express PCI Express Root Port (rev c0) 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 01:00.0 Multimedia audio controller: Creative Labs Unknown device 0005 01:01.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11) 01:01.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11) 01:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01) 01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20) 04:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce 7600 GT] (rev a1) The DVD-RW drive is a Plextor 760A PATA regards Bjoern Bjoern, is this bug also present in 2.6.23-rc3? I guess so. I would guess... yes, but I'll give it a shot today. greetings Bjoern Okay, now this is getting more and more weired... I switched back to 2.6.22 on my Desktop and still I can only burn CDs but no DVDs... Aug 16 17:17:47 freax ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Aug 16 17:17:47 freax ata5.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x2a data 3276 8 out Aug 16 17:17:47 freax res 40/00:03:00:00:20/00:00:00:00:00/b0 Emask 0x4 (timeout) Aug 16 17:17:52 freax ata5: port is slow to respond, please be patient (Status 0xd8) Aug 16 17:17:57 freax ata5: device not ready (errno=-16), forcing hardreset Aug 16 17:17:57 freax ata5: soft resetting port Aug 16 17:17:57 freax ata5.00: configured for UDMA/66 Aug 16 17:17:57 freax ata5.01: configured for UDMA/33 Aug 16 17:17:57 freax ata5: EH complete Aug 16 17:18:05 freax ata5.00: 16 bytes trailing data Tried switching to 2.6.23_rc3 but some stuff will not compile against 2.6.23_rc3 so I'll have to wait till stuff works befor I can test 2.6.23 Hopfully I am not mixing things here. But I guess the Problem on the Desktop and the Server could be the same problem (both are using exclusivly libata) regards Bjoern Created attachment 12428 [details]
Clock HPT374 with 50 MHz DPLL (for real :-)
Try this patch please?
It's absolutely necessary for HPT374 to work: the chip can't tolerate 66 MHz DPLL clock that the driver is setting it to -- it might have been fixed by 2.6.22 if the fix was complete...
The HPT Controller works after using the two patches on vanilla 2.6.22.3 1) Diff for PLL tuning (http://bugzilla.kernel.org/attachment.cgi?id=12104) 2) Clock HPT374 with 50 MHz DPLL (http://bugzilla.kernel.org/attachment.cgi?id=12428) moved 188 files (~3GB)to the mentiond raid device without problems. Maybe I sould open another bug for the ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen on the intel ICH7 (parallel port) when writing data to DVD (CD works) (The drive workes, tried it with Windows and there I could burn a DVD without problems) Thansk for the fix regards Bjoern NO NOT Fixed... after doing some more heavy IO to another RAID 1 (with two disks) attached to the same Controller I got th following: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata3.00: cmd 35/00:00:97:51:7b/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out res 40/00:00:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout) ata3: soft resetting port Find mode for 12 reports C829C62 Find mode for 12 reports C829C62 Find mode for DMA 69 reports 1CAE9C62 Find mode for DMA 69 reports 1CAE9C62 ata3.00: configured for UDMA/100 ata3.01: configured for UDMA/100 ata3: EH complete sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) sd 2:0:0:0: [sda] Write Protect is off sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) sd 2:0:1:0: [sdb] Write Protect is off sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) sd 2:0:0:0: [sda] Write Protect is off sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) sd 2:0:1:0: [sdb] Write Protect is off sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA The problem got better bot is not solve... or could the same error messae occure if there are some bad blocks/cluster on the drive? regards Bjoern A bad block should cause a precise error report from the drive rather than a timeout. It might timeout if the drive is struggling badly but I would expect to see a report of an actual media error back from the drive. You can also use the smart tools to check the last failed commands as the drive sees them I detected the above only because a KDE Dialogue told me during modifiing serveral files that it could not write to a file, I told KDE just to skip the file and the process continued modifiing my files. Before I applied the "Clock HPT374 with 50 MHz DPLL (for real :-)" patch the entire process was hang and had to be restartet, than after a short time the same happened again. After patching, the process stoped at a count of ~1000 files but after the "EH complete" I could just continue modifiing another 9000 files without problem and without any "exception Emask" Since that Error I could not reproduce it anymore. Maybe after a reboot... Checking smart did not show anything usefull: smartctl -a /dev/sda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Maxtor 6Y200P0 Version: YAR4 Serial number: Y60QJ*** Device type: disk Local Time is: Tue Aug 21 20:11:45 2007 CEST Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging 20:11:45 [~] smartctl -a /dev/sdb smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA Maxtor 6Y200P0 Version: YAR4 Serial number: Y60QJ*** Device type: disk Local Time is: Tue Aug 21 20:11:54 2007 CEST Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging Regards Bjoern The system is up for several days now and I got no more errors. And hopfully they will not reoccure. Will the "50 MHz DPLL" patch be commited to the mainlain Kernel? regards and thanks Bjoern (In reply to comment #16) > Will the "50 MHz DPLL" patch be commited to the mainlain Kernel? Already there. :-) Okay, bug is still present! on heavy IO (I have but a VirtualBox image on the raid connected on the hpt controller) I get the following now serveral time: The drives recover fine but I guess they should never fail ;-) Let me know if you need more info. Aug 30 15:49:07 enterprise ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Aug 30 15:49:07 enterprise ata3.01: cmd ca/00:e0:4f:34:03/00:00:00:00:00/f0 tag 0 cdb 0x0 data 114688 out Aug 30 15:49:07 enterprise res 40/00:00:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout) Aug 30 15:49:07 enterprise ata3: soft resetting port Aug 30 15:49:07 enterprise Find mode for 12 reports C829C62 Aug 30 15:49:07 enterprise Find mode for 12 reports C829C62 Aug 30 15:49:07 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 15:49:07 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 15:49:07 enterprise ata3.00: configured for UDMA/100 Aug 30 15:49:07 enterprise ata3.01: configured for UDMA/100 Aug 30 15:49:07 enterprise ata3: EH complete Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 15:49:07 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 15:49:07 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [...] Aug 30 15:56:17 enterprise ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Aug 30 15:56:17 enterprise ata3.01: cmd ca/00:68:8f:95:00/00:00:00:00:00/f0 tag 0 cdb 0x0 data 53248 out Aug 30 15:56:17 enterprise res 40/00:00:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout) Aug 30 15:56:17 enterprise ata3: soft resetting port Aug 30 15:56:17 enterprise Find mode for 12 reports C829C62 Aug 30 15:56:17 enterprise Find mode for 12 reports C829C62 Aug 30 15:56:17 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 15:56:17 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 15:56:17 enterprise ata3.00: configured for UDMA/100 Aug 30 15:56:17 enterprise ata3.01: configured for UDMA/100 Aug 30 15:56:17 enterprise ata3: EH complete Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 15:56:17 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 15:56:17 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [...] [...] Aug 30 18:41:31 enterprise ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Aug 30 18:41:31 enterprise ata3.01: cmd 35/00:00:6f:be:6b/00:02:17:00:00/f0 tag 0 cdb 0x0 data 262144 out Aug 30 18:41:31 enterprise res 40/00:00:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout) Aug 30 18:41:31 enterprise ata3: soft resetting port Aug 30 18:41:31 enterprise Find mode for 12 reports C829C62 Aug 30 18:41:31 enterprise Find mode for 12 reports C829C62 Aug 30 18:41:31 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 18:41:31 enterprise Find mode for DMA 69 reports 1CAE9C62 Aug 30 18:41:31 enterprise ata3.00: configured for UDMA/100 Aug 30 18:41:31 enterprise ata3.01: configured for UDMA/100 Aug 30 18:41:31 enterprise ata3: EH complete Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] 398297088 512-byte hardware sectors (203928 MB) Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Write Protect is off Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 30 18:41:31 enterprise sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB) Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Write Protect is off Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00 Aug 30 18:41:31 enterprise sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Here's a full bootlog: http://paste.olausson.de/88c1613c72.html regards Bjoern Okay, this problem now also occures on other devices (I just didn't see it. Now I also noticed it on my Raid5 Array. Any progress, any ideas? It manly occures when moving around large number of small files (in my case a lot of photos, all around 2-5MB) regards Bjoern (In reply to comment #20) > Okay, this problem now also occures on other devices (I just didn't see it. I wash my hands then... :-) (In reply to comment #21) > > I wash my hands then... :-) > I guess you are reffering to your hard work on fixing this bug, so your hands ar getting so sweaty from the uncountable keystroks ... hrrhrr, am I right.... ;-) Here you can see what files (pictures) are causing the trouble https://gallery.boonline.dyndns.org (No parental controll required, can be viewd from age between 1 to 99+ [maybe glasses required]) I recommend the following URL: https://gallery.boonline.dyndns.org/v/Snapshots/Sport/Tanzen/Club/Weihnachtsball/2006_12_02/oulu/?g2_page=2 (some of the best Dancers from Oulu (Finnland)) They won the team competition! I had to move around the pictures, recreate the gallery, change headers, exifdate, so I touched a large number of files on the devices in a very short time. Have fun. regards Bjoern Okay, since Sep 20 23:01:27 I had no more "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen" Currently I am switching from 2.6.23-rc8 to 2.6.23. Hopfully there are no more "exception Emask" errors and we can close this bug... but just give me some days/weeks to test it ;-) No more "exception Emask" since Sep 20 23:01:27 running on 2.6.23 I'll mark this one as solved. Thanks for the help regards Bjoern |