Bug 11148 - AHCI driver work incorrectly, HDD hang when heavy load.
Summary: AHCI driver work incorrectly, HDD hang when heavy load.
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-22 15:07 UTC by Yuriy Dmitriev
Modified: 2010-01-19 20:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.26
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
it is working config for 2.6.25.11 (33.72 KB, application/octet-stream)
2008-07-22 15:13 UTC, Yuriy Dmitriev
Details
it is working config 2.6.26 (33.40 KB, application/octet-stream)
2008-07-22 15:18 UTC, Yuriy Dmitriev
Details
ahci-reset-debug.patch (2.45 KB, patch)
2008-10-13 01:13 UTC, Tejun Heo
Details | Diff

Description Yuriy Dmitriev 2008-07-22 15:07:52 UTC
Latest working kernel version: 2.6.25.11
Earliest failing kernel version: 2.6. (forgot) 15 or later. 
Distribution: gentoo vanilla kernel.
Hardware Environment: MB asus p955x CPU 920D 2G ram
 lspci 
00:00.0 Host bridge: Intel Corporation 82955X Memory Controller Hub (rev 81)
00:01.0 PCI bridge: Intel Corporation 82955X PCI Express Root Port (rev 81)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:05.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)
03:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2)


Software Environment:
minimal console work, no one running services. all hand maded minimal.



Problem Description:

I write this from 2.6.25.11 but can`t from 2.6.26


This message from /var/log/messages. When disk in state havy load aprox 20-30 min, for example compile gcc

 $ gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.2/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --enable-multilib --enable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,treelang --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
Thread model: posix
gcc version 4.1.2 (Gentoo 4.1.2 p1.1)

or, 3 simulateonly command $ rsync -r /usr/portage /tmp/{1,2,3,4}

or, simple #emerge gcc

I see this message.

Jul 22 22:56:23 triod ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jul 22 22:56:23 triod ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 22 22:56:23 triod res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 22:56:23 triod ata1.00: status: { DRDY }
Jul 22 22:56:24 triod ata1: soft resetting link
Jul 22 22:56:24 triod ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 22 22:56:24 triod ata1.00: configured for UDMA/133
Jul 22 22:56:24 triod ata1: EH complete
Jul 22 22:56:24 triod sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
Jul 22 22:56:24 triod sd 0:0:0:0: [sda] Write Protect is off
Jul 22 22:56:24 triod sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jul 22 22:56:24 triod sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

and, comuter stil WORKING.

BUT, on 2.6.26 computer HUNG BEFORE disk reinitialization.
I see only like 
Jul 22 22:56:23 triod ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jul 22 22:56:23 triod ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 22 22:56:23 triod res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 22:56:23 triod ata1.00: status: { DRDY }
Jul 22 22:56:24 triod ata1: soft resetting link

and disk is unacceptable. kernel not hang, work only software, which now in memory.
I try replace HDD, some play with kernels config - result = 0.
FS=XFS.
 Afrer reset, xfs_check, and 20-30 min kernel work, and HUNG on first HDD havy load, like copy large count of files.


Steps to reproduce:
install gentoo and try #emerge gcc
or some IO load.
HDD- samsung 80 Gb 100% working. I try change - same result.
Help only downgrade from 2.6.26 to 2.6.25.11

If need some add info - question is welcome )))))
Comment 1 Yuriy Dmitriev 2008-07-22 15:13:20 UTC
Created attachment 16943 [details]
it is working config for 2.6.25.11

it is working config for 2.6.25.11
Comment 2 Yuriy Dmitriev 2008-07-22 15:18:04 UTC
Created attachment 16944 [details]
it is working config 2.6.26

with this config disk hung on havy load.

config similar to 2.6.25.11, difference not relative to IO system.
Comment 3 Tejun Heo 2008-07-31 19:32:37 UTC
Ah... strange.  Your drive is timing out FLUSH.  Can you hook up a serial or netconsole can capture full log from boot to hang?  Also, please post the result of "lspci -nn" and "smartctl -a /dev/sda".
Comment 4 Yuriy Dmitriev 2008-08-01 14:27:11 UTC
# lspci -vv
00:00.0 Host bridge: Intel Corporation 82955X Memory Controller Hub (rev 81)
        Subsystem: ASUSTeK Computer Inc. Device 8178
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
        Latency: 0
        Capabilities: [e0] Vendor Specific Information <?>

00:01.0 PCI bridge: Intel Corporation 82955X PCI Express Root Port (rev 81) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 16 bytes
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: daf00000-dfffffff
        Prefetchable memory behind bridge: 00000000e0000000-00000000efffffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [88] Subsystem: Intel Corporation Device 0000
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
                Address: fee0300c  Data: 4151
        Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s, Latency L0 <256ns, L1 <4us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
                        Slot #  0, PowerLimit 75.000000; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        Kernel driver in use: pcieport-driver

00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
        Subsystem: ASUSTeK Computer Inc. Device 817f
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 16 bytes
        Interrupt: pin A routed to IRQ 19
        Region 0: Memory at dadf8000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [60] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed unknown, Width x0, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 16 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v1) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- RBE- FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 2.5GT/s, Width x2, ASPM L0s L1, Latency L0 <256ns, L1 <4us
                        ClockPM- Suprise- LLActRep+ BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surpise+
                        Slot #  1, PowerLimit 10.000000; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
                Address: fee0300c  Data: 4159
        Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Device 8179
        Capabilities: [a0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: pcieport-driver

00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 20
        Region 4: I/O ports at 9000 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 17
        Region 4: I/O ports at 9400 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin C routed to IRQ 18
        Region 4: I/O ports at 9800 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin D routed to IRQ 19
        Region 4: I/O ports at a000 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at dadff800 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        I/O behind bridge: 0000c000-0000cfff
        Memory behind bridge: dae00000-daefffff
        Prefetchable memory behind bridge: 0000000088000000-00000000880fffff
        Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] Subsystem: ASUSTeK Computer Inc. Device 8179

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Capabilities: [e0] Vendor Specific Information <?>
        Kernel modules: intel-rng

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP])
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
        Latency: 0
        Interrupt: pin A routed to IRQ 0
        Region 0: I/O ports at 01f0 [size=8]
        Region 1: I/O ports at 03f4 [size=1]
        Region 2: I/O ports at 0170 [size=8]
        Region 3: I/O ports at 0374 [size=1]
        Region 4: I/O ports at ffa0 [size=16]

00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI Controller (rev 01) (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Device 2606
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 317
        Region 0: I/O ports at b800 [size=8]
        Region 1: I/O ports at b400 [size=4]
        Region 2: I/O ports at b000 [size=8]
        Region 3: I/O ports at a800 [size=4]
        Region 4: I/O ports at a400 [size=16]
        Region 5: Memory at dadffc00 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable+
                Address: fee0300c  Data: 4169
        Capabilities: [70] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
        Subsystem: ASUSTeK Computer Inc. Device 8179
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin B routed to IRQ 0
        Region 4: I/O ports at 0400 [size=32]

01:05.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)
        Subsystem: ASUSTeK Computer Inc. Marvell 88E8001 Gigabit Ethernet Controller (Asus)
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 16 bytes
        Interrupt: pin A routed to IRQ 21
        Region 0: Memory at daefc000 (32-bit, non-prefetchable) [size=16K]
        Region 1: I/O ports at c800 [size=256]
        Expansion ROM at 88000000 [disabled] [size=128K]
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data <?>
        Kernel driver in use: skge
        Kernel modules: skge

03:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 81cf
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at dc000000 (32-bit, non-prefetchable) [size=64M]
        Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
        [virtual] Expansion ROM at dafe0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [78] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <4us
                        ClockPM- Suprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Kernel driver in use: nvidia
        Kernel modules: nvidia

triod # 

==========================================================================



# smartctl -a /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint P80 SD series
Device Model:     SAMSUNG HD080HJ
Serial Number:    S08EJ10Y948868
Firmware Version: WT100-33
User Capacity:    80 026 361 856 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Sat Aug  2 00:24:27 2008 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (1872) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp
ort.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  31) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -
       0
  3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -
       4352
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -
       2283
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -
       10
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -
       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -
       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -
       13371
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -
       0
 11 Calibration_Retry_Count 0x0012   253   100   000    Old_age   Always       -
       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -
       1540
190 Airflow_Temperature_Cel 0x0022   112   055   000    Old_age   Always       -
       42
194 Temperature_Celsius     0x0022   112   055   000    Old_age   Always       -
       42
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -
       212110
196 Reallocated_Event_Count 0x0032   099   099   000    Old_age   Always       -
       10
197 Current_Pending_Sector  0x0012   253   100   000    Old_age   Always       -
       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -
       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -
       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -
       0
201 Soft_Read_Error_Rate    0x000a   253   100   000    Old_age   Always       -
       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -
       0

SMART Error Log Version: 1
ATA Error Count: 4229 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4229 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle
.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 7f 61 a8 e4  Error:  at LBA = 0x04a8617f = 78143871

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 03 08 7f 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 77 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 6f 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 67 61 a8 e4 00      04:24:47.250  READ MULTIPLE
  c4 03 08 5f 61 a8 e4 00      04:24:47.250  READ MULTIPLE

Error 4228 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle
.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 f7 5c a8 e4  Error:  at LBA = 0x04a85cf7 = 78142711

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 03 08 f7 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 ef 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 e7 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 df 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 d7 5c a8 e4 00      04:24:36.500  READ MULTIPLE

Error 4227 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle
.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 f7 5b a8 e4  Error:  at LBA = 0x04a85bf7 = 78142455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 d8 08 f7 5b a8 e4 00      04:24:35.813  READ MULTIPLE
  c4 d8 08 ef 5b a8 e4 00      04:24:35.813  READ MULTIPLE
triod triod # smartctl -a /dev/sda
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint P80 SD series
Device Model:     SAMSUNG HD080HJ
Serial Number:    S08EJ10Y948868
Firmware Version: WT100-33
User Capacity:    80 026 361 856 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:    Sat Aug  2 00:24:45 2008 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (1872) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  31) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -       4352
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2283
  5 Reallocated_Sector_Ct   0x0033   099   099   010    Pre-fail  Always       -       10
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       13371
 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   253   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1540
190 Airflow_Temperature_Cel 0x0022   112   055   000    Old_age   Always       -       42
194 Temperature_Celsius     0x0022   112   055   000    Old_age   Always       -       42
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       212110
196 Reallocated_Event_Count 0x0032   099   099   000    Old_age   Always       -       10
197 Current_Pending_Sector  0x0012   253   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   100   000    Old_age   Always       -       0
202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 4229 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4229 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 7f 61 a8 e4  Error:  at LBA = 0x04a8617f = 78143871

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 03 08 7f 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 77 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 6f 61 a8 e4 00      04:24:47.313  READ MULTIPLE
  c4 03 08 67 61 a8 e4 00      04:24:47.250  READ MULTIPLE
  c4 03 08 5f 61 a8 e4 00      04:24:47.250  READ MULTIPLE

Error 4228 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 f7 5c a8 e4  Error:  at LBA = 0x04a85cf7 = 78142711

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 03 08 f7 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 ef 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 e7 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 df 5c a8 e4 00      04:24:36.500  READ MULTIPLE
  c4 03 08 d7 5c a8 e4 00      04:24:36.500  READ MULTIPLE

Error 4227 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 f7 5b a8 e4  Error:  at LBA = 0x04a85bf7 = 78142455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 d8 08 f7 5b a8 e4 00      04:24:35.813  READ MULTIPLE
  c4 d8 08 ef 5b a8 e4 00      04:24:35.813  READ MULTIPLE
  c4 d8 08 e7 5b a8 e4 00      04:24:35.813  READ MULTIPLE
  c4 d8 08 df 5b a8 e4 00      04:24:35.750  READ MULTIPLE
  c4 d8 08 d7 5b a8 e4 00      04:24:35.750  READ MULTIPLE

Error 4226 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 9f 59 a8 e4  Error: ICRC, ABRT 8 sectors at LBA = 0x04a8599f = 78141855

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 d8 08 9f 59 a8 e4 00      04:24:34.938  READ DMA
  c8 d8 08 97 59 a8 e4 00      04:24:34.938  READ DMA
  c8 d8 08 8f 59 a8 e4 00      04:24:34.938  READ DMA
  c8 d8 08 87 59 a8 e4 00      04:24:34.938  READ DMA
  c8 d8 08 7f 59 a8 e4 00      04:24:34.938  READ DMA

Error 4225 occurred at disk power-on lifetime: 12895 hours (537 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 27 59 a8 e4  Error: ICRC, ABRT 8 sectors at LBA = 0x04a85927 = 78141735

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 d8 08 27 59 a8 e4 00      04:24:34.438  READ DMA
  c8 d8 08 1f 59 a8 e4 00      04:24:34.438  READ DMA
  c8 d8 08 17 59 a8 e4 00      04:24:34.438  READ DMA
  c8 d8 08 0f 59 a8 e4 00      04:24:34.438  READ DMA
  c8 d8 08 07 59 a8 e4 00      04:24:34.438  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     13271         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1
SMART Selective self-test log data structure revision number 0
Warning: ATA Specification requires selective self-test log data structure revision number = 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

triod triod # 
Comment 5 Yuriy Dmitriev 2008-08-01 14:56:22 UTC
Its home computer, sorry, I do not have netconsole at home ((((
I afraid run 2.6.26, And I afraid again loss FS.

My steps to determine sourse of error.
1. I change HDD to same disk, also samsung. Problem present.
2. I change HDD to maxtor (no HW ahci) - works FINE.
3. Install back Sansung (ahci) and after boot, # echo 1 >  /sys/block/sda/device/queue_depth  == WORKS FINE when AHCI disabled manually. In log files - no errors.
OKay.
Step 4. 
I replace sata cables.  /sys/block/sda/device/queue_depth = 31. 
Error present, when ahci is enabled ((((.

I downgrade kernel from 2.6.26 to 2.6.25.11 (now 13). Error present, but hdd not freeze. It can restart, as you can see in part of my log files.





Log from 2.6.25.13 (part)
Problem still present.


Jul 31 03:20:01 triod cron[28304]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 03:30:01 triod cron[28316]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 03:40:01 triod cron[28328]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 03:50:01 triod cron[28340]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:00:01 triod cron[28352]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:00:01 triod cron[28354]: (root) CMD (rm -f /var/spool/cron/lastrun/cron.hourly)
Jul 31 04:10:01 triod cron[28366]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:14:13 triod ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jul 31 04:14:13 triod ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 31 04:14:13 triod res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 31 04:14:13 triod ata1.00: status: { DRDY }
Jul 31 04:14:13 triod ata1: soft resetting link
Jul 31 04:14:13 triod ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 31 04:14:13 triod ata1.00: configured for UDMA/133
Jul 31 04:14:13 triod ata1: EH complete
Jul 31 04:14:13 triod sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
Jul 31 04:14:13 triod sd 0:0:0:0: [sda] Write Protect is off
Jul 31 04:14:13 triod sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jul 31 04:14:13 triod sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 31 04:20:01 triod cron[28378]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:30:01 triod cron[28390]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:40:01 triod cron[28402]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 04:50:01 triod cron[28414]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 05:00:01 triod cron[28426]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 05:00:01 triod cron[28428]: (root) CMD (rm -f /var/spool/cron/lastrun/cron.hourly)
Jul 31 05:10:01 triod cron[28440]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 05:20:01 triod cron[28452]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 05:30:01 triod cron[28464]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jul 31 05:40:01 triod cron[28476]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
J
Comment 6 Tejun Heo 2008-08-03 01:47:32 UTC
Ah... long output.  Please attach as text file next time.  With lines wrapped and all, it becomes pretty difficult to follow.  I'll try to summarize.  Please point out if I got it wrong.

1. The problem only occurs with p80 samsung drives when NCQ is enabled and you tested two of them.
2. You also tested a maxtor drive which doesn't support NCQ and it worked fine.
3. The symptom is drive timing out 0xE7 and sometimes EH fails to recover from that.

Your controller being ICH7 AHCI, I tend to think it's not driver or controller problem.  It's one of the most tested ones.  Can you please do the followings to rule out hardware problems?

1. Can you please test another drive (different generation or from different vendor) which can do NCQ and see whether the same problem exists?
2. Timeout on FLUSH and following filesystem corruption often indicates power problem - disk pulls more power trying to write out the buffer content, power fluctuates, disk checks out briefly and lose data.  There are two ways to rule this out.

   * If you have access to another power supply.  Power it up without motherboard and connect only the harddrive to the PSU, which is safe to do in SATA and see whether the problem disappears.

     http://modtown.co.uk/mt/article2.php?id=psumod

   * Another way is to see whether SMART counters record such event.  Boot the machine, record smartctl -a output, trigger the error condition and after it run smarctl -a again and compare the output.  Power related failures often show up as incremented start stop, power cycle or emergency unload count.

Thanks.
Comment 7 Yuriy Dmitriev 2008-08-15 15:37:15 UTC
Latest test result:
Kernel-2.6.25.15 working fine.
Kernel-2.6.26.2 NOT working.
Answers for yours questions:
1. Yes
2. Yes
3. Yes. May be it depends of method of reset in diferent kernels? How to possible trigger or debug it? (sw/hw reset)

)) I know this hack with PSU. I ferquently use it. But I think, this problem is not a power supply depended. One kernel work, & another no. I use relatively good PSU, not *made in China* or similar. I use 500W ~$100 PSU. To MB connected 1 HDD only. Not included keyboard, mouse and videocard (GeForce 6600 GT). 
500W PSU must be good for this configuration, I think. But I remember, one kernel work, one no.
Investigate SMART counters before & after failure do _nothing_. I compare it via diff command. Only change time of power.

I know, samsung p80 is a not best choice for storage, but I don't have another.(((
Or, where I can read about changes in IO system from 2.6.25 to 2.6.26? May be this is bug not ahci related? I my life, I frequently see, how to bug in one subsystem or HW depend of another, not directly depended...
Any idea?

P.S.
Heo, thank you for you time)))
Comment 8 Tejun Heo 2008-08-29 05:56:57 UTC
Hmm... The only significant difference between 2.6.25 and 2.6.26 is that 2.6.26 uses hardreset by default while 2.6.25 uses softreset.  This shouldn't make any difference for ahci tho as its initialization sequence include controller reset which implies hardresets.  Just in case, can you please try 2.6.27-rc5 w/ the following kernel parameter?

  "libata.force=nohrst"

It will force libata to use softreset instead.

Thanks.
Comment 9 Yuriy Dmitriev 2008-10-07 03:50:37 UTC
Thanks, it working )))) it help )))

But I must say, Samsung SpinPoint P80 not normal working with Asus P955X MB. I try 2 hdd same models.

I replace hdd to segate ST3300831AS & I resolve ALL problem with AHCI )))))

Wery thanks )))))))
Comment 10 Tejun Heo 2008-10-13 01:13:15 UTC
Created attachment 18283 [details]
ahci-reset-debug.patch

If you still have the drive for testing, can you please apply the attached patch and post the resulting kernel log?  Thanks.
Comment 11 Yuriy Dmitriev 2008-10-13 14:24:00 UTC
Ok, wait pls few days. I take back P-80. I present it to my niece ))
Comment 12 Alan 2010-01-19 20:01:12 UTC
Closing stale bug

Note You need to log in before you can comment on or make changes to this bug.