Most recent kernel where this bug did not occur: no later one tested Distribution: Debian Etch/Sid Hardware Environment: IBM ThinkPad T23 Software Environment: Problem Description: Kernel 2.6.17.1 shows dma_intr: status=0x58 for /dev/hda once on boot. Tested with plain vanilla 2.6.17.1 as well as vanilla + sws2 2.2.5.3. This is the exact log excerpt from /var/log/syslog: ---------------------------------------------------------- Jun 23 07:35:30 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } Jun 23 07:35:30 deepdance kernel: ide: failed opcode was: unknown Jun 23 07:35:30 deepdance kernel: hda: set_drive_speed_status: status=0x58 { DriveReady SeekComplete DataRequest } Jun 23 07:35:30 deepdance kernel: ide: failed opcode was: unknown ---------------------------------------------------------- Does that mean that my harddisk may fail soon? Or does kernel 2.6.17.1 try a command my controller or harddisk does not understand? Diagnostic data follows: smartctl seems to indicicate that all is well: ---------------------------------------------------------- root@deepdance:~ -> smartctl -a /dev/hda smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen [...] === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED [...] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 2950 - # 2 Extended offline Completed without error 00% 2944 - # 3 Short offline Completed without error 00% 2913 - # 4 Extended offline Completed without error 00% 2912 - # 5 Short offline Completed without error 00% 2903 - # 6 Short offline Completed without error 00% 2875 - # 7 Short offline Completed without error 00% 2852 - # 8 Short offline Completed without error 00% 2836 - # 9 Short offline Completed without error 00% 2808 - #10 Short offline Completed without error 00% 2801 - #11 Short offline Completed without error 00% 2766 - #12 Extended offline Completed without error 00% 2754 - #13 Short offline Completed without error 00% 2741 - #14 Short offline Completed without error 00% 2717 - #15 Short offline Completed without error 00% 2705 - #16 Short offline Completed without error 00% 2678 - #17 Short offline Completed without error 00% 2659 - #18 Short offline Completed without error 00% 2658 - #19 Short offline Completed without error 00% 2639 - #20 Short offline Completed without error 00% 2634 - #21 Short offline Completed without error 00% 2618 - ---------------------------------------------------------- It shows some rather old errors that have been logged about a year or so - I decided long ago that these would not be critical, but I am not completely sure: ---------------------------------------------------------- Error 200 occurred at disk power-on lifetime: 393 hours (16 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 10 59 01 73 65 6c ee Error: IDNF at LBA = 0x0e6c6573 = 241984883 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 20 03 01 73 65 6c ee 00 00:00:50.900 READ SECTOR(S) c8 03 01 73 65 6c ee 00 00:00:50.900 READ DMA 20 03 01 73 65 6c ee 00 00:00:50.800 READ SECTOR(S) c8 03 01 73 65 6c ee 00 00:00:50.800 READ DMA 20 03 01 73 65 6c ee 00 00:00:50.700 READ SECTOR(S) ---------------------------------------------------------- This is my harddisk (an IBM / Hitachi one): Kernel command line: root=/dev/hda5 ro resume2=swap:/dev/hda7 resume=/dev/hda7 ide0: BM-DMA at 0x1860-0x1867, BIOS settings: hda:DMA, hdb:pio hda: HTS548060M9AT00, ATA DISK drive hda: max request size: 512KiB hda: 117210240 sectors (60011 MB) w/7877KiB Cache, CHS=16383/255/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 > It has a quite high Load_Cycle_Count, since long time I used a sub optimal apm value in hdparm configuration: ---------------------------------------------------------- root@deepdance:~ -> smartctl -a /dev/hda | grep 'Load_Cycle_Count' | awk '{print $10}' 175309 ---------------------------------------------------------- (see http://blog.philkern.de/archives/173-Load-Cycle-Fix-for-Toshiba-HDDs.html) This is my hdparm configuration (/etc/hdparm.conf): ---------------------------------------------------------- /dev/hda { mult_sect_io = 16 write_cache = off dma = on apm = 0 acoustic_management = 128 io32_support = 3 keep_settings_over_reset = on interrupt_unmask = on } ---------------------------------------------------------- I have write cache still off due issues with XFS and disabled write cache (see bug #6380). I want to try 2.6.17 which should have write barriers enabled by default for XFS some time with write cache disabled and then enable them to see whether it works stable. This is the hardware: ---------------------------------------------------------- root@deepdance:~ -> lspci 0000:00:00.0 Host bridge: Intel Corporation 82830 830 Chipset Host Bridge (rev 04) 0000:00:01.0 PCI bridge: Intel Corporation 82830 830 Chipset AGP Bridge (rev 04) 0000:00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 0000:00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 0000:00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 42) 0000:00:1f.0 ISA bridge: Intel Corporation 82801CAM ISA Bridge (LPC) (rev 02) 0000:00:1f.1 IDE interface: Intel Corporation 82801CAM IDE U100 (rev 02) 0000:00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 0000:00:1f.5 Multimedia audio controller: Intel Corporation 82801CA/CAM AC'97 Audio Controller (rev 02) 0000:01:00.0 VGA compatible controller: S3 Inc. SuperSavage IX/C SDR (rev 05) 0000:02:00.0 CardBus bridge: Texas Instruments PCI1420 0000:02:00.1 CardBus bridge: Texas Instruments PCI1420 0000:02:02.0 Communication controller: Agere Systems WinModem 56k (rev 01) 0000:02:08.0 Ethernet controller: Intel Corporation 82801CAM (ICH3) PRO/100 VE (LOM) Ethernet Controller (rev 42) 0000:03:00.0 USB Controller: NEC Corporation USB (rev 43) 0000:03:00.1 USB Controller: NEC Corporation USB (rev 43) 0000:03:00.2 USB Controller: NEC Corporation USB 2.0 (rev 04) ---------------------------------------------------------- I attach output of lspci -vvn to this report. Steps to reproduce: Honestly I have no idea. I do not what triggers this behavior.
Created attachment 8387 [details] output of lspci --vvn
Created attachment 8388 [details] configuration of kernel 2.6.17.1 I used (plain vanilla, without sws2)
On Thu, 22 Jun 2006 23:55:17 -0700 bugme-daemon@bugzilla.kernel.org wrote: > Most recent kernel where this bug did not occur: no later one tested You misunderstand. This question is asking whether an earlier kernel did _not_ have this problem. If so, what was that earlier version?
Thanks, Andrew. I wanted to add this information as I recognized I did not include it and got a mid air collision ;-) with your comment. I used 2.6.15.7 with sws2-2.2 before. I grepped through older logs and did not find any occurences of this error: root@deepdance:/var/log -> zgrep "hda: dma_intr: status=0x58" syslog* syslog:Jun 23 03:42:26 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } syslog:Jun 23 07:35:30 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } syslog.1.gz:Jun 23 01:44:26 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } These are all with the new kernel. I attach the kernel configuration of that older kernel as well.
Created attachment 8391 [details] configuration of the older kernel (2.6.15.7-sws-2.2) that did not show those dma_intr messages A more generic grep did not show any different results: root@deepdance:/var/log -> zgrep "hda: dma_intr" syslog* syslog:Jun 23 03:42:26 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } syslog:Jun 23 07:35:30 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } syslog.1.gz:Jun 23 01:44:26 deepdance kernel: hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } I am using 2.6.17.1 now with sws2 again cause it seems that the new sws1 from 2.6.17 seems to need that additional user space stuff in order to work - I thought that it would work without it. At least it didnt work here - empty screen with blinking cursor and nothing else. I decided for now to use what works for me instead of trying some supposedly experimental userspace sws. sws2 seems to work fine. But anyway I got this dma_intr message with 2.6.17.1 without sws2 as well (syslog.1.gz grep result).
Now I also tested with the 2.6.16.11 the latest one of 2.6.16 that I used for a short time. This kernel also does not show the dma_intr messages at boot... thus this behaviour must have been introduced somewhere between 2.6.16.11 and 2.6.17.1. root@deepdance:~ -> cat /proc/version Linux version 2.6.16.11-tp23-sws2-2.2.5 (root@deepdance) (gcc-Version 4.0.3 (Debian 4.0.3-1)) #1 PREEMPT Fri Apr 28 13:17:58 CEST 2006 I reverted back to 2.6.15 cause sound didn't work after resume from sws 2 - it works nicely with 2.6.17 again ;-). If needed I can try with 2.6.16.21 or whatever is latest 2.6.16 as well. I attach kernel config of this one as well.
Created attachment 8403 [details] configuration of more recent older kernel (2.6.16.11) that does not show dma_intr messages I leave all of these installed in case they are required for further testing.
I am experiencing (almost) the same issue. I have tested 2.6.16.20 and 2.6.17, the problem occurs in both. The issue occurs when hdparm is used (-X68 -d1 -c1 -u1) while disk access is taking place. I've only noticed it recently due to recent changes in Gentoo which have enabled parallel startup of hdparm with other services. If I run hdparm manually, with no disk access going on, there is no problem. When it is run as part of the boot process the following appear in dmesg: hda: set_drive_speed_status: status=0x58 { DriveReady SeekComplete DataRequest } ReiserFS: warning: is_leaf: free space seems wrong: level=1, nr_items=29, free_s pace=100 rdkey ReiserFS: hda2: warning: vs-5150: search_by_key: invalid format found in block 3 3810. Fsck? ReiserFS: hda2: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occur red trying to find stat data of [2 709178 0x0 SD] ReiserFS: warning: is_leaf: free space seems wrong: level=1, nr_items=29, free_s pace=100 rdkey ReiserFS: hda2: warning: vs-5150: search_by_key: invalid format found in block 3 3810. Fsck? ReiserFS: hda2: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occur red trying to find stat data of [2 709178 0x0 SD] I assume the ReiserFS problems are related, they do not occur otherwise. Unfortunately often this leaves my system unable to boot with various errors (files missing, segmentation faults, permission problems). My hardware is completely different (a desktop machine): 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 00:0a.0 Ethernet controller: Davicom Semiconductor, Inc. 21x4x DEC-Tulip compatible 10/100 Ethernet (rev 31) 00:0c.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07) 00:0c.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07) 00:0d.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03) 00:0e.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8233 PCI to ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1b) 00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1b) 00:11.4 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 1b) 01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX/MX 400] (rev a1)
Hmmm, seems that I am also getting slight corruption here. With XFS. See kernel bug #6757
The slight corruption I had with XFS had a different cause. See that bugreport (bug #6757).
Martin, Any news on this problem? Is it still there with latest kernel? If so, can you please do git bisect or binary search so you can identify the patch that broke it. It might take a while though. Thanks.
I'm closing this one, please re-open if it still happens with the recent kernels.
Hmmm, this is ancient. I already forgot about this one. I can't say much about it at the moment, since I am using libata since quite some time. A zgrep "dma_intr" syslog* does not yield any results on my T23 so I think at least libata doesn't have this issue ;). So for me that issue is solved even when that is with a different driver architecture, closing bug. If anyone still has this issue with IDE drivers he/she can reopen it.
One additional info: That zgrep results relate to 2.6.24 and 2.6.23 kernels, I think anything prior to that is logrotated to nil already.
OK, thanks for info.