Bug 14157

Summary: end_request: I/O error, dev cciss/cXdX, sector 0
Product: IO/Storage Reporter: jiri.harcarik
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, birrachiara, florian, markus, rjw, spike, tj, tmhikaru, wschlich
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    
Attachments: dmesg output

Description jiri.harcarik 2009-09-11 07:42:17 UTC
After upgrade to the latest stable kernel on HP server with Smart Array Controller these errors occurs in /var/log/messages :

Sep 10 12:42:45 node2 kernel: [   91.568968] end_request: I/O error, dev cciss/c0d1, sector 0
Sep 10 12:42:45 node2 kernel: [   91.575688] end_request: I/O error, dev cciss/c0d1, sector 0
Sep 10 12:42:45 node2 kernel: [   91.719894] end_request: I/O error, dev cciss/c0d2, sector 0
Sep 10 12:42:45 node2 kernel: [   91.720092] end_request: I/O error, dev cciss/c0d2, sector 0

Previous working kernel version that I tried is 2.6.30.4 without any errors.
Comment 1 Misha Labjuk 2009-09-12 20:49:21 UTC
I think this is not scsi bug.

I get same errors flood with IDE disc. 

SiS5513 EIDE Controller with drive

Model Family:     Seagate Barracuda ATA IV family
Device Model:     ST320011A


Sep 12 17:08:02 ololo XFS mounting filesystem dm-0
Sep 12 17:08:02 ololo Ending clean XFS mount for filesystem: dm-0
Sep 12 17:08:02 ololo end_request: I/O error, dev hda, sector 0
Sep 12 17:08:02 ololo end_request: I/O error, dev hda, sector 0
...
and more 50 messages per second logged.



This is a regression.  2.6.30.6 last not affected version. 2.6.31 first affected.
Comment 2 Andrew Morton 2009-09-13 22:40:04 UTC
Marked as a regression.

Randomly reassigned to block layer.

Weird.
Comment 3 Anonymous Emailer 2009-09-14 00:08:56 UTC
Reply-To: James.Bottomley@suse.de

> --- Comment #2 from Andrew Morton <akpm@linux-foundation.org>  2009-09-13
> 22:40:04 ---
> Marked as a regression.
> 
> Randomly reassigned to block layer.
> 
> Weird.

It's theoretically possible, I suppose, but no-one else is seeing this.
It sounds like some local config setup issue, which a full dmesg might
shed some light on ... say some protected area on sector 0 or something.

James
Comment 4 Misha Labjuk 2009-09-14 04:32:54 UTC
Created attachment 23091 [details]
dmesg output
Comment 5 Jens Axboe 2009-09-14 06:31:31 UTC
Using XFS?
Comment 6 Misha Labjuk 2009-09-14 10:10:08 UTC
Yes, XFS. Problem only with IDE disk. XFS on SATA (without Host Protected Area) working.

Also  "end_request: I/O error, dev hda, sector 0" recived only after file system changed.


2.6.30.6------------------------------
ide-gd driver 1.18
hda: max request size: 128KiB
hda: Host Protected Area detected.
  current capacity is 39100223 sectors (20019 MB)
  native  capacity is 39102336 sectors (20020 MB)
hda: Host Protected Area disabled.
hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=38792/16/63
hda: cache flushes not supported
hda: hda1


2.6.31------------------------------
ide-gd driver 1.18
hda: max request size: 128KiB
hda: Host Protected Area detected.
	current capacity is 39100223 sectors (20019 MB)
	native  capacity is 39102336 sectors (20020 MB)
hda: 39100223 sectors (20019 MB) w/2048KiB Cache, CHS=38789/16/63
hda: cache flushes not supported
 hda: hda1
hda: p1 size 39102147 exceeds device capacity, enabling native capacity
hda: detected capacity change from 20019314176 to 20020396032


Why 2.6.31 didn't disable Host Protected Area?
Сapacity change?!
Comment 7 Jens Axboe 2009-09-14 10:50:51 UTC
I thought so, then this is just the empty barrier being failed and warned. It can be safely ignored, I'll make sure it goes away.
Comment 8 Wolfram Schlich 2009-11-06 11:25:51 UTC
This is still present in 2.6.31.5.
Adding "nobarrier" to the xfs mount options works around this issue.
What's the current status of fixing this?
Comment 9 tmhikaru 2009-11-09 17:14:52 UTC
I'm having the same problem with a usb ide hard disk on 2.6.31.5 - previous versions of 2.6.31.x used to make the filesystem remount readonly, for some reason this does not happen with this version of the kernel. Trying the workaround for xfs does not work for me, I'm using ext4 - disabling barriers (using barrier=0) in the mount options for ext4 did *not* work around the problem for me.

This is the sort of output I'm getting: (From the test with barriers disabled)

Nov  9 11:59:31 roll kernel: usb 1-4: new high speed USB device using ehci_hcd and address 6 
Nov  9 11:59:31 roll kernel: usb 1-4: New USB device found, idVendor=067b, idProduct=2506 
Nov  9 11:59:31 roll kernel: usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3 
Nov  9 11:59:31 roll kernel: usb 1-4: Product: Mass Storage Device 
Nov  9 11:59:31 roll kernel: usb 1-4: Manufacturer: Prolific Technology Inc. 
Nov  9 11:59:31 roll kernel: usb 1-4: SerialNumber: 0 
Nov  9 11:59:31 roll kernel: usb 1-4: configuration #1 chosen from 1 choice 
Nov  9 11:59:31 roll kernel: scsi3 : SCSI emulation for USB Mass Storage devices 
Nov  9 11:59:36 roll kernel: scsi 3:0:0:0: Direct-Access     ST325082 3A               3.06 PQ: 0 ANSI: 0 
Nov  9 11:59:36 roll kernel: sd 3:0:0:0: Attached scsi generic sg0 type 0 
Nov  9 11:59:36 roll kernel: sd 3:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB) 
Nov  9 11:59:36 roll kernel: sd 3:0:0:0: [sda] Write Protect is off 
Nov  9 11:59:36 roll kernel:  sda: sda1 
Nov  9 11:59:37 roll kernel: sd 3:0:0:0: [sda] Attached SCSI disk 

==> /var/log/syslog <== 
Nov  9 11:59:36 roll kernel: sd 3:0:0:0: [sda] Assuming drive cache: write through 

==> /var/log/messages <==
Nov  9 12:03:48 roll kernel: EXT4-fs (sda1): barriers disabled 
Nov  9 12:03:48 roll kernel: kjournald2 starting: pid 7298, dev sda1:8, commit interval 5 seconds 
Nov  9 12:03:48 roll kernel: EXT4-fs (sda1): internal journal on sda1:8 
Nov  9 12:03:48 roll kernel: EXT4-fs (sda1): delayed allocation enabled 
Nov  9 12:03:48 roll kernel: EXT4-fs: file extents enabled 
Nov  9 12:03:48 roll kernel: EXT4-fs: mballoc enabled 
Nov  9 12:03:48 roll kernel: EXT4-fs (sda1): mounted filesystem with ordered data mode 
Nov  9 12:06:35 roll kernel: sd 3:0:0:0: [sda] Unhandled error code 
Nov  9 12:06:35 roll kernel: sd 3:0:0:0: [sda] Result: hostbyte=0x07 driverbyte=0x00 

==> /var/log/syslog <== 
Nov  9 11:59:37 roll last message repeated 2 times
Nov  9 12:06:35 roll kernel: end_request: I/O error, dev sda, sector 4202703
Comment 10 Tejun Heo 2009-12-01 06:03:27 UTC
Worlfram, tmhikaru, can you guys please attach the output of dmesg after such failure?

Thanks.
Comment 11 tmhikaru 2009-12-01 07:08:46 UTC
Would you please ignore or otherwise delete my comment, I apologize but it turned out to be hardware failing (the usb enclosure was damaged somehow and was giving random I/O errors) so it's entirely unrelated to this bug. Sorry!
Comment 12 Markus Gaugusch 2009-12-13 20:35:55 UTC
I can confirm this problem on 2.6.31.5 (openSUSE 11.2 distro kernel).
I use ext3 only and get lots of
 end_request: I/O error, dev cciss/c0d1, sector 0
(every 15-20 seconds)
Comment 13 Tejun Heo 2009-12-14 01:54:09 UTC
Markus, can you reproduce the problem on 2.6.32?
Comment 14 Markus Gaugusch 2009-12-14 21:07:21 UTC
This is a production system, but I might be able to try openSUSE factory kernel (2.6.32) during the holidays.
Comment 15 Tejun Heo 2009-12-14 22:31:16 UTC
Yeap, that will be great.  In case you don't know, 2.6.32 kernel packages are available in the following url.

  http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/
Comment 16 jiri.harcarik 2009-12-15 08:35:20 UTC
I can confirm the last kernel 2.6.32 is without that error for me on my HP server.
Comment 17 R.Ghetta 2010-01-26 19:09:10 UTC
I can confirm the bug with opensuse kernel 2.6.31.8-0.1.1 (x86_64)
Opensuse kernel 2.6.32.5-0.0.3.f89b2ba-default works correctly.

BTW, it doesn't seem just a cosmetic problem. On the test system (HP ML350 g6, 4-core Xeon, Smart Array 641, ext3) the faulty kernel sometimes seems to stall disk operations while outputting those messages.
Comment 18 Tejun Heo 2010-01-26 23:48:06 UTC
Umm... as upstream is already working fine.  At this point, I think it'll probably be best to report these problems to distros so that they can backport it to their kernels.  Jens, can you please point out which commit fixed this one?

Thanks.
Comment 19 Jens Axboe 2010-01-27 08:14:35 UTC
In mainline it should be fixed by commit 6cafb12d. It is just a cosmetic issue, if you see stalls otherwise it must be some other problem.